2025Activity reportProject-TeamMARIANNE
RNSR: 202524661B- Research center Inria Centre at Université Côte d'Azur
- In partnership with:Université Côte d'Azur, CNRS
- Team name: Models and data for computational argumentatIon in natural language
- In collaboration with:Laboratoire informatique, signaux systèmes de Sophia Antipolis (I3S)
Creation of the Project-Team: 2025 February 01
Each year, Inria research teams publish an Activity Report presenting their work and results over the reporting period. These reports follow a common structure, with some optional sections depending on the specific team. They typically begin by outlining the overall objectives and research programme, including the main research themes, goals, and methodological approaches. They also describe the application domains targeted by the team, highlighting the scientific or societal contexts in which their work is situated.
The reports then present the highlights of the year, covering major scientific achievements, software developments, or teaching contributions. When relevant, they include sections on software, platforms, and open data, detailing the tools developed and how they are shared. A substantial part is dedicated to new results, where scientific contributions are described in detail, often with subsections specifying participants and associated keywords.
Finally, the Activity Report addresses funding, contracts, partnerships, and collaborations at various levels, from industrial agreements to international cooperations. It also covers dissemination and teaching activities, such as participation in scientific events, outreach, and supervision. The document concludes with a presentation of scientific production, including major publications and those produced during the year.
Keywords
Computer Science and Digital Science
- A9.2.1. Supervised learning
- A9.2.8. Deep learning
- A9.4. Natural language processing
- A9.6. Decision support
- A9.8. Reasoning
- A9.11. Generative AI
- A9.14. Evaluation of AI models
- A9.15. Symbolic AI
- A9.16. Societal impact of AI
- A9.17. Cybersecurity and AI
Other Research Topics and Application Domains
- B9.6.8. Linguistics
- B9.6.10. Digital humanities
- B9.9. Ethics
1 Team members, visitors, external collaborators
Research Scientists
- Serena Villata [Team leader, CNRS, Senior Researcher, from Feb 2025, HDR]
- Victor David [INRIA, ISFP, from Feb 2025]
- Federica Granese [INRIA, Starting Research Position, from Feb 2025]
Faculty Members
- Elena Cabrio [UNIV COTE D'AZUR, Professor, from Feb 2025]
- Anaïs Ollagnier [UNIV COTE D'AZUR, Associate Professor, from Feb 2025]
Post-Doctoral Fellows
- Sofiane Elguendouze [CNRS, Post-Doctoral Fellow, from Feb 2025]
- Yingxue Fu [I3S, Post-Doctoral Fellow, from Feb 2025]
PhD Students
- Greta Damo [UNIV COTE D'AZUR, from Feb 2025]
- Deborah Dore [UNIV COTE D'AZUR, from Feb 2025]
- Thi Thao Ha [I3S, from Nov 2025]
- Cyprien Michel-Deletie [ENS DE LYON, from Feb 2025]
- Nicolas Ocampo [UNIV COTE D'AZUR, from Feb 2025 until Apr 2025]
- Ekaterina Sviridova [UNIV COTE D'AZUR, from Feb 2025]
- Xiaoou Wang [CNRS, from Feb 2025 until Nov 2025]
Technical Staff
- Mariana Eugenia Chaves Espinoza [UNIV COTE D'AZUR, Engineer, from Feb 2025 until Aug 2025]
- Theo Alkibiades Collias [UNIV COTE D'AZUR, from Feb 2025]
Interns and Apprentices
- Hajar Bakarou [UNIV COTE D'AZUR, Intern, from Apr 2025 until Sep 2025]
- Loup Doinel [INRIA, Intern, from Mar 2025 until Sep 2025]
- Mohamed Sinane El Messoussi [UNIV COTE D'AZUR, Intern, from Apr 2025 until Sep 2025]
- Erwan Hain [INRIA, Apprentice, from Feb 2025]
- Nino Pireaud [UNIV COTE D'AZUR, Intern, from Apr 2025 until Oct 2025]
- Diego Zuniga Blassio [UNIV COTE D'AZUR, Intern, from Sep 2025]
Administrative Assistant
- Delphine Robache [INRIA]
2 Overall objectives
2.1 Context: the need for automating natural language argument analysis
The field of computational argumentation is emerging as an important aspect of Artificial Intelligence (AI) research. The reason is based on the recognition that, if we are to develop robust intelligent systems, then it is imperative that they can handle incomplete and inconsistent information in a way that somehow emulates the way humans tackle such a complex task. Humans handle incomplete and inconsistent information by using argumentation either internally, by evaluating arguments and counterarguments, or externally, by for instance entering into a discussion or in a debate where arguments are exchanged.
Argumentation is an effective approach for solving various theoretical and practical problems, like explaining and justifying the decision making outcomes. The idea of “argumentation” as the process of creating arguments for and against competing claims, has been a subject of interest to philosophers and lawyers. In Computer Science, argumentation offers a novel framework shedding light on classical forms of reasoning, such as logical deduction, induction and abduction, interactive explanations, discussion and negotiation in computer-supported cooperative work, and learning.
Argumentation is the process by which arguments are constructed and handled: this means that arguments are compared, evaluated in some respect and judged to establish whether any of them is warranted. In computational models of argument, each argument is a set of assumptions that, together with a conclusion, is obtained by a reasoning process. In monological argumentation, a single agent reasons by constructing arguments to support and attack a conclusion to reach a decision, while, in dialogical argumentation, a group of arguers interacts to construct arguments supporting or attacking a particular claim and finally deliberate. An argumentation framework is essentially a directed graph in which the arguments are represented as nodes and the attack relation is represented by the arrows. This basic definition can be extended with different kinds of relations, e.g., support. A major step concerns the assessment of a set of arguments and their conclusions to establish their justification status, and therefore compute their acceptability degree. The assessment of the justification status of the arguments allows the agent to decide what to believe and what to do. Argumentation semantics provide formal criteria to determine which sets of arguments (i.e., extensions) can be regarded as collectively acceptable. Both qualitative and quantitative approaches have been proposed in the literature to assess the acceptance of an argument.
Argument(ation) mining (AM) is the research field in artificial argumentation aiming at automatically processing natural language arguments and reasoning upon them. It aims at extracting natural language arguments and their relations from text, with the final goal of providing machine-processable structured data for computational models of argument.
MARIANNE intends to carry out research at the interface of Natural Language Processing (NLP) and Argumentation, for modeling, mining, generating and reasoning upon argument structures from complex, uncertain, genre-specific, and multilingual textual data. One major goal of MARIANNE is to validate the scaling of the models produced by validating the main proposals through experimental implementations, in collaboration with researchers from other disciplines like Linguistics, Philosophy, Sociology and Law, to pursue high-impact research in AI.
2.2 Scientific objectives: mining, assessing and generating arguments
The MARIANNE project-team proposes and investigates NLP methods and algorithms for natural language argumentation to address the following three axes:
Axis A - Argument mining:
The first research axis is the development of models and algorithms designed for mining natural language arguments from text. The Argument Mining (AM) scientific community has focused till now on the two main tasks constituting the AM pipeline, i.e., the detection of argumentative components (namely, evidence and claims), and the prediction of the relations of support and attack holding among them. They represent an obligatory starting step, but the resulting argumentation frameworks (i.e., the graph structure composed by the arguments as the nodes of the graph and the relations representing the links) are still quite simple with respect to the needs raised in the team application scenarios. Our goal is to enhance the extraction of machine-processable natural language argument structures to allow reasoning over complex real world natural arguments, with a focus on the English, French, Italian, Spanish and German languages.
Axis B - Natural language argument quality assessment:
The second research axis is focused on the definition of computational methods to automatically assess the quality of natural language arguments. Despite a few existing approaches, the issue of automatically assessing the quality of an argumentation remains largely unexplored. On the one side, it consists in assessing the quality of the mined arguments to decide, for instance, whether a certain argument has to be selected for synthesizing a debate, or whether the overall debate is of good quality. On the other side, it consists in ensuring that the newly generated arguments satisfy the defined quality criteria in order to assess them from the qualitative point of view, i.e., a counter-argument to attack a fake news needs to be concise and without repetitions. The quality of the arguments is also characterized by formal properties of the argumentation graph, e.g., the argument strength, argument preferences, and argument acceptability.
Axis C - Generation of natural language arguments:
In addition to the definition of more effective methods to mine (fine-grained) argumentative structures from text, and to automatically assess their quality, the third research axis of the team consists in the definition of new (generative and not generative) methods to generate natural language arguments, with a focus on English and French initially. This process is incremental and starts with the generation of single argumentative components towards the generation of arguments in the context of interactive dialogues with users. These dialogues are then employed in different use cases with different goals, i.e., explanation, counter-argumentation. These arguments are firstly generated starting from the mined arguments, and they rely on abductive reasoning schema based on the set of critical questions and reasoned responses necessary to reach the user's understanding.
Ethics.
Like all AI algorithms, the results of the technological solutions developed in MARIANNE may potentially be used in a misleading way. This potential danger concerns mainly the argumentation generation part, e.g., the generation of hateful and fallacious arguments. This is why a significant part of the research activity of MARIANNE is dedicated to the definition of computational solutions to “ensure” that the argumentation generated by our algorithms in human-machine conversations will not contain harmful or unfair content. This means to go beyond the pure inclusion of language guardrails, as in current Large Language Model (LLM) solutions, as the ethical foundation of an argumentation must be “controlled” on the discourse level too. The team members are already working on this topic, as witnessed by the recent publication at EMNLP 2024. In addition, the awareness of the ethical concerns around AI is high, thanks also to the participation of team members (Serena Villata) to national instances like the National Pilot Committee of Digital Ethics (CNPEN).
3 Research program
The MARIANNE project-team aims at developing natural language processing (NLP) methods and algorithms for argumentation in natural language. The research program of the team is organized around the following three axes:
Axis A - Argument mining:
The first research axis will be the development of models and algorithms designed for mining natural language arguments from text.
Axis B - Natural language argument quality assessment:
The second research axis will be focused on the definition of computational methods to automatically assess the quality of natural language arguments.
Axis C - Generation of natural language arguments:
The third research axis of the team will consist in the definition of new (generative and not generative) methods to generate natural language arguments.
Axis A: Argument mining
Long term objectives
Current models are shown to learn linguistic patterns or cues rather than a real semantic understanding of the arguments. For instance, a model could predict a relation correctly, but it cannot explain the underlying reasoning of why the two linked components are in a relationship, since the required warrants are not explicitly mentioned in the text. Current classification algorithms can therefore be effective until a certain point, where the relation can be inferred from explicit mentions in the text. The team will therefore investigate methods to inject both common sense knowledge and expert domain knowledge that humans exploit to carry out such kind of reasoning, in the form of knowledge bases or ontologies, to boost the performance of current neural methods applied to Argument Mining (AM) tasks. This long-term objective takes different forms depending on the application scenario: for the healthcare and the legal ones it consists in precise domain knowledge pointed out by domain experts, whilst in the disinformation, hate speech and the politics ones, it consists in a subtle mixture of precise domain knowledge (e.g., if we talk about vaccines or a new tax law) and common sense knowledge. This step is mandatory to boost the performances of argument mining models.
Thus far, we concentrated on the task of predicting intra-argument relations, i.e., those relations holding between the evidence and claims of a single argument. However, a necessary further step consists in moving from intra-argument relation prediction to inter-argument relation prediction. Establishing an argumentative link between two (or more) different arguments is mandatory to obtain a full argumentation framework (e.g., a graph representation of all the evidence and claims extracted from the clinical essays about Covid-19, where arguments are linked through argumentative relations) able to capture possible inconsistencies (e.g., self-attacking arguments, rebuttal on points supporting the candidate's viewpoint in political debates) and fallacious arguments (e.g., ad hominem attacks), that are at the basis of the research challenge on argument quality discussed below. We plan to tackle this issue through neural models like Graph Neural Networks (GNNs), which represent a natural choice given the graph-based structure of the argumentation, with a particular focus on the use of GNNs as a model of neural-symbolic computing, ensuring explainability.
Whilst a high-level classification of the argument components using the classes evidence and claim and of the relations with attack and support is required to identify the main argumentative elements present in the text, as soon as we move to precise application scenarios this general classification becomes insufficient and needs to be refined with more precise classes. Both for argumentative components and relations, the finer-grained classes strongly depend on the application domain (e.g., in the political domain we may have critical discussion and strategic maneuvering as argument classes, and rebut and undercut as relation classes). For argumentative relations, the idea is to start considering the two classical inter-argument relations in formal computational models of argument, namely rebut (i.e., the claim of an argument attacks the claim of another argument) and undercut (i.e., the claim of an argument attacks the premises of another argument).
Overall, in this first axis, the team will extend and empower neural AM models and algorithms to detect even more precisely both argument components and relations in the application scenarios of the team (i.e., medicine, politics, legal cases, hate speech, disinformation), with a focus on the English, French, Italian, Spanish and German languages. All the above mentioned objectives require to pass through (i) the definition of detailed annotation guidelines of high quality linguistic resources, (ii) the annotation of inter-argument relations and finer-grained classes of components/relations, and (iii) the definition of neural AM algorithms empowered with knowledge bases and domain ontologies. The team will also release a number of annotated linguistic resources for these tasks that will be made available to contribute to the advancements of the research community.
Medium-term projects
In order to reach the long term objectives, we work on the following projects:
- Mining argument misalignments and conflicts: Another mid-term objective of MARIANNE will be the identification and classification of misalignments and overt conflicts that can hinder argumentative discussion and block democratic deliberation. We will design technology-enhanced dialogue spaces based on dispute mediation, that can be applied also in the context of social media discussions to detect and counter the use of abusive language. This innovative method, which scales up mediation techniques for large-scale dialogues, foresees two steps: (i) Conflict analysis: analysis of the main sources and the pattern of escalation of conflicting episodes in the use cases, based on argument mining methods (e.g., frequent debated issues, conflicting arguments) and argument-based qualitative analysis of discourses (which allows to see the dynamics of conflict escalation). The argument conflict analysis will uncover the real issues, recurring types of conflicting arguments that are based on different starting points, but also uncover real issues on which the discussion can proceed and points of commonalities. To achieve this goal, our objective will be to leverage the structural information present in the debates to enhance the performance of relation prediction between argument components. We will tackle this task by integrating Knowledge Graph Embedding models with Transformer-based models based on Pre-trained Large Language Models.
- Mining implicit arguments: In argumentation, it is crucial to comprehend the main arguments being put forward, the underlying reasoning, and how these arguments interrelate within a specific context. In this mid-term objective, we aim to develop systems capable of automatically extracting arguments and discerning their relationships. However, human argumentation is more intricate than mere argument extraction and lexical or semantic analysis. Argumentative units may not always be explicitly stated in discourse; instead, they are often connected through implicit inferences. These inferences depend on various factors, including background general knowledge, domain-specific expertise (as medical or juridical ones), subjective reasoning, or other patterns, which complicate argument comprehension and comprehensive analysis in the different domains. Therefore, our objectives are to investigate the role of external knowledge and other extra-linguistic features in elucidating implicit reasoning chains in argumentation, and to utilize these implicit elements to enhance existing methods for extracting arguments and predicting their relationships.
- Mining arguments over time: The way argumentation is carried out in public discourse, political debates and scientific communication changes over time. Another mid-term goal of the team will be to automatically explore the dynamics of inter and intra-argument structures over time. On the one side, the task will be to investigate through semi-supervised and unsupervised learning methods how the argumentation evolved over time. For instance, for political debates, we plan to start with the USElecDeb60To20 dataset we built, containing all the US presidential debates since 1960 to 2020. The objective is to study the temporal evolution trends of the argumentation, to see how the structure of the arguments evolved (e.g., number and fine-grade degree of the premises, presence of major claims, employment of rhetorical elements, choice of news events, change points). On the other side, the task will consist in the investigation of the dynamics of the argumentation in terms of attacks and supports among the candidates' arguments (i.e., graph level analysis of the argumentation). The final goal will be to assess if and how the dynamics of the argumentation impacted the outcome of the decision making process. For instance, this would allow us to learn from past argumentation dynamics to predict the results of the future elections in a country.
Axis B: Natural language argument quality assessment
Long term objectives
To automatically assess the quality of argument structures, we will consider first the following three main high-level dimensions to characterize arguments' quality: (i) Cogency: an argument should be seen as cogent if it has individually acceptable premises that are relevant to the argument's conclusion and that are sufficient to draw the conclusion; (ii) Rhetorical effectiveness: an argumentation should be seen as effective if it persuades the audience of the author's stance on the discussed issue. Besides the logical grounding of the actual arguments (logos), it also takes into consideration the credibility of the argument source (ethos) and the emotional force of the argumentation (pathos); (iii) Reasonableness: an argumentation should be seen as reasonable if it contributes to the resolution of the given issue in a sufficient way that is acceptable to everyone from the expected target audience. These three dimensions of argument quality will then be specified in further sub-dimensions, like the fact that a premise of an argument should be seen as relevant if it contributes to the acceptance or rejection of the argument's conclusion (i.e., if it is worthy considering it as a reason, evidence, or similar regarding the conclusion), or that an argumentation is successful in creating credibility if it conveys arguments in a way that makes the author trustworthy (e.g., by indicating the honesty of the author or by revealing the author's expertise regarding the discussed issue), and it should be seen as not successful if the opposite holds.
In addition to these three main argument quality dimensions, we will consider also two other elements, namely conceptual notions, and influence factors. Conceptual notions are the notions of maximal quality (based on arguing goals such as agreement and deliberation or on preferences between different arguments), and the notions of minimal quality, which represent what makes an argument valuable or appropriate to be stated as well as how to avoid fallacies. Influence factors are factors that influence the perception of quality beyond the content, structure, and style of the argument itself like argument-related factors (such as the argument's length, its structure in terms of relations between units, and revisions applied to it), and context-related factors (such as the domain of the discussion, the audience addressed, and the debaters involved).
These three dimensions takes into account both the content of the argument itself (cogency and rhetorical effectiveness) and the argumentation structure (reasonableness, considering both counter-arguments and their rebuttals), whilst maximal/minimal quality and the influence factors concerns more the context of the arguments to be assessed. To address the automatic quality assessment of the arguments, we will need to jointly consider the content of the textual arguments (i.e., by fine-tuning generalist or domain-specific Large Language Models) and the graph structure of the whole argumentation (i.e., through the generation of graph embeddings or the usage of Graph Neural Network models). These node-level features will be used to create graph level statistics useful for the reasonableness dimension in particular.
Finally, the same criteria proposed for assessing the quality of the mined arguments will be used to assess, as a long-term objective, the quality of the generated arguments. In addition, the argument generation will also be evaluated with diversity, which is of paramount importance, since verbatim repetition of arguments can become detrimental in explanation and counter-argumentation scenarios. Both lexical diversity and semantic diversity can be considered at this stage, where lexical diversity focuses on the variability of the generated arguments and can be captured by word overlapping metrics, and semantic diversity focuses on meaning and is harder to be captured, as in the case of generated arguments with similar meaning but different wordings. The model to generate arguments will be also evaluated along with standard metrics, i.e., BLEU and BertScore (computing a similarity score, using contextual embeddings, for each token in the candidate sentence with each token in the reference sentence) concerning the lexical and semantic generation performances.
Medium-term projects
In order to reach the long term objectives, we work on the following projects:
- Counter-narrative Quality Assessment: This first mid-term objective aims to perform a quality assessment of Counter Narratives (CN). In particular, it focuses on the evaluation of the effectiveness of counter narratives against hate speech. Furthermore, it delves into fairness evaluation, proposing the development of metrics to assess representation, and inclusivity, along with employing bias detection techniques to mitigate biases in counter narratives. The research will be conducted on publicly available CN datasets, created both by experts from Non-Governmental organisations (NGOs) or collected directly from user-generated messages on social media. To assess effectiveness, various dimensions have to be considered. These include: the importance of emotional appeal, the analysis of argumentative structures and identification of argumentative language. Additionally, we will consider the need for soundness, target specificity, and complexity analysis in counter narratives. To assess fairness, we will define metrics for evaluating counter narratives against hate speech, focusing on avoidance of harmful stereotypes or biases, and appropriate language. The automatic quality assessment of CN will be addressed through a novel neural model including word embeddings, sentiment and emotion features, and pre-trained LLMs. Furthermore, this task addresses the differences in style between expert-generated and user-generated counter narratives, emphasizing the need for research to understand their impact on effectiveness and fairness.
- Fallacious and Fake Argument Assessment: Fallacies are arguments that employ faulty reasoning. Given their persuasive and seemingly valid nature, fallacious arguments are often used in political debates. Employing these misleading arguments in politics can have detrimental consequences for society, since they can lead to inaccurate conclusions and invalid inferences from the public opinion and policymakers. Automatically detecting and classifying fallacious arguments represents therefore a crucial challenge to limit the spread of misleading or manipulative claims, and promote a more informed and healthier political discourse. In this mid-term objective, we aim to go beyond the six categories of fallacious arguments we considered thus far (i.e., Ad Hominem, Appeal to Authority, Appeal to Emotion, False Cause, Slippery Slope, and Slogans), in order to address the two-fold task of fallacious argument detection and classification through defining neural network architectures based on Transformers models, combining text, argumentative features, and engineered features. More precisely, we will tackle challenging fallacy categories like causal ones (e.g., Common Cause, False Cause, Reversing Causation, Post-Hoc), where reasoning and knowledge-based features are jointly required to identify the fallacy. Finally, we also target the usage of AM methods to provide an argumentative representation of fact-checkers' justifications to improve the precision and explainability of fake news classification systems.
Axis C: Generation of Natural Language Arguments
Long term objectives
The two last long term objectives are related to the generation of natural language arguments.
More precisely, we plan to start with the extractive argument generation, following the example of extractive summarization approaches. The idea is to select the main arguments identified in the text during the mining phase (Axis A), and generating these arguments verbatim producing a kind of argument-based summary of the original text. To select the main arguments to generate, we will exploit the quality criteria defined in Axis B. We will incrementally work starting from the generation of claims to the generation of whole argumentative structures composed of evidence and claims linked by attack and support relations. Secondly, we plan to move to the abstractive argument generation. The argument generation models we target will employ a text planning module to conduct content selection over the mined arguments and specify the suitable language style at sentence-level in order to select the talking points to cover for each sentence to be generated, combined with a content generation module to produce the final fluent argument on a certain topic. For this task, we will fine-tune Large Language Models for general application scenarios and for our use cases (i.e., counter-argument generation to fight online hate speech and disinformation, argument-based natural language explanations). The generated arguments need to meet two criteria, namely quality (e.g., variability of the arguments, no repetitiveness) and quantity. The evaluation of the generated arguments will be addressed by employing standard metrics for natural language generation, i.e., ROUGE (a recall-oriented metric), BLEU (based on n-gram precision) and METEOR (measuring unigram precision and recall by considering synonyms, paraphrases, and stemming).
The second long-term objective is to propose and test a more concrete argument-based dialogue, based on the results of the previous step. First, the argument-based dialogue neural model selects the mined arguments (i.e., components and relations) supporting or attacking a given (stance about a) topic and generates new arguments using the mined ones. Then, a task-oriented dialogue model provides the generated arguments about a certain topic and allows the user to ask clarification questions. It is in charge of generating appropriate queries to the argumentation model, and of generating an answer. The argumentation can be evaluated along with three axes: 1) validity of the used inferences; 2) quality of the generated interactions; 3) epistemic quality of conclusions in a given domain. Measurable factors such as the time spent by each user interacting with the system or the number of repeated questions are taken as indicators of this model. A quantitative analysis of the dialogues will be also conducted, based on different configurations of the dialogue simulator.
This research axis can be questioned with respect to the hype on generative AI the scientific community is facing these days. In the team, in the three scientific axes we propose, we will make use of Large Language Models which we will fine tune for our precise tasks, and we will employ generative AI models like GPT, Gemini, Claude or Mistral for zero-shot or few-shot evaluations. As this issue concerns more an engineering (i.e., prompt engineering) contribution than a scientific one, in this research program, we did not develop further this part, even though we are obliged to address it to compare with state-of-the-art systems.
Medium-term projects
In order to reach the long term objectives, we work on the following projects:
- Argument-based XAI: Explainable Artificial Intelligence (XAI) has the goal to explain the decisions made by an intelligent system. This topic has recently raised a lot of interest in the AI research community due to the need to explain machine decisions, in particular in sensitive application domains like medicine. AI models must be able to provide interpretable and robust explanations for their decisions, as well as to learn the best way to explain their predictions to humans. This popular issue has already been investigated by different communities in the field, and several approaches have been proposed to get a better understanding of AI models from a mathematical point of view. Current results emphasize that there is the need to generate explanations coping with human reasoning methods, such as argumentation, and that natural language explanations is one of the most challenging ways to materialize them. Our objective is to connect argumentation to XAI through the interactive generation of arguments to help humans to get a better understanding of machine predictions. Arguments need not only to be rational, but “manifestly” rational, so that arguers can see for themselves the rationale behind the inferential steps taken. We will propose novel argument generation methods to generate natural language explanations targeting different domains (i.e., medicine, fake news, hate speech, law, policy making). The important feature we target for generating these natural language explanations is two-fold: on the one side, they must unveil the reasons behind a decision through manifestly rational arguments, and they need to consider the human feedback to improve the firstly generated explanation (i.e., adding more details, broaden the discussion, provide reference to institutional sources). The generated argument-based explanations will benefit the user in terms of justification (exposing the reasoning behind a decision, thus aiding the user to decide how much credence to give in it), user involvement (allowing the user to add her knowledge and inference skills to the overall decision process), and system acceptance (in that the system’s functionality is fully transparent and its suggestions are adequately justified).
- Counter-Argument Dialogue Generation: Existing resources highlight the proficiency of large language models in generating counter-arguments, for instance, against hate speech or fake news. However, they primarily focus on simple two-turn exchanges, whereas real-life interactions often involve multiple, complex dialogues. Furthermore, among those approaches, it still remains unclear which arguments are crucial for steering conversations toward positive shifts. To address these issues, our research aims to delve deeper into the structure and dynamics of counter-argumentation in human and machine-generated resources extended dialogues. Our objectives are four-fold: i) explore the use and impact of argumentative structures (premise, conclusion, major claims, and attack/support relations) and rhetorical strategies (logos, pathos, and ethos) in longer conversations; (ii) examine how the generated argument structures may evolve across multiple turns; (iii) assess how well existing counter-arguments taxonomies align with effective argument generation methods; and (iv) develop novel methods for generating counter-arguments that guarantee these conversation shifts, through reinforcement learning with human feedback (RLHF).
4 Application domains
The team mainly focuses on the application domains in healthcare, politics, law, hate speech and disinformation, given that argumentation-based decision making is becoming increasingly prominent in such contexts, and the team members already have experience in these application scenarios.
In the healthcare domain, clinicians need frameworks supporting them in extracting relevant information from textual documents (in particular from clinical trials), that is subsequently presented in a structured way. Given the aptness of Argument Mining methods to automatically detect in text those argumentative structures that are at the basis of evidence-based reasoning applications, AM represents a valuable contribution in the healthcare domain.
Political debates and political programs offer a rare opportunity for citizens to compare the candidates' positions on the most controversial topics of the campaign. Given their innate argumentative features, they represent another natural playground for Argument Mining methods. The ability of identifying argumentative components and predicting their relations in such a kind of texts opens the door to cutting-edge tasks like fallacy detection and counter-argumentation generation.
Given its intrinsic argumentative features, legal text represents another natural but challenging use case scenario. The idea is to define novel methods to automatically identify those arguments from such texts, connect them by semantic relations and enrich them through the matching with the concepts present in knowledge graphs, to aid the comparative analysis of jurists like in case law.
Two application scenarios are linked to defense, and more precisely, to information warfare, i.e., disinformation and hate speech. First, content moderation is not enough to limit the diffusion of misleading or fake information, as successfully recognizing online disinformation depends not only on understanding whether factual statements are true, but also on interpreting and critically assessing the reasoning and arguments provided in support of conclusions.
Second, there are already tools that use AI and allow the automatic detection of violent and harmful speech, but most of these tools do not integrate cultural and contextual dimensions, and do not go beyond the identification of explicit hate speech, as we focus on in the team. Using natural language argumentation to constitute a counter-argumentation would also be beneficial.
5 Highlights of the year
- Serena Villata has been granted with an ERC Consolidator grant 2025 for the PANDORA project. The project will start in June 2026.
- Serena Villata has been nominated member of the French Council for Artificial Intelligence and Digital Technology in July 2025.
- Victor David coordinated the new Inria Associate Team with UCL, in collaboration with Prof. Tony Hunter.
- Elena Cabrio has been nominated member of the AI Advisory Council of the company ENGIE Energy.
- Serena Villata has been nominated member of the French Advisory Committee on Digital Ethics in September 2025.
- Serena Villata has been nominated Scientific Director of the Interdisciplinary Institute of Artificial Intelligence (3IA Côte d'Azur) in November 2025.
5.1 Awards
- Greta Damo , Benjamin Ocampo and Serena Villata were awarded with the "Prix d'excellence" of the University Côte d'Azur in December 2025.
6 Latest software developments, platforms, open data
In MARIANNE, we identified the following privileged application fields, based on existing research activities of team members and on the local eco-system: Defense, Digital Humanities, Medicine.
The main objective of the software implementations, i.e., proofs of concept, developed in the team is to experimentally validate the results obtained and ease the transfer of the developed methodologies to industry. Most of these proofs of concept are released as Python packages or Web platforms.
MARIANNE team members conceived and designed the above mentioned tools, identified the requirements and provided the specification design, as well as the formal definition of the algorithms involved. The main proofs of concept developed in the team and that we aim to pursue are the following.
6.1 Latest software developments
6.1.1 HERACLES
-
Name:
Online Topic Change Point Detection
-
Keywords:
Incremental clustering, Continual Learning
-
Functional Description:
This script performs topic modeling on chunks of documents using BERTopic and detects change points in topic distributions over time using Online Change Point Detection (OCPD). It processes document chunks iteratively, updating the topic model with each new chunk, and identifies significant changes in topics over time.
-
Contact:
Serena Villata
6.1.2 StreamETM
-
Keywords:
Machine learning, Artificial intelligence, Natural language processing
-
Scientific Description:
StreamETM is an application designed for dynamic topic modeling using the Embedded Topic Model (ETM). It processes streaming text data, merges topic models over time, and detects change points in topic distributions.
Features: - Dynamic Topic Modeling: Continuously update topic models with new data chunks. - Topic Merging: Merge new topic models with existing ones to maintain a coherent topic structure. - Change Point Detection: Detect significant changes in topic distributions over time using Online Change Point Detection (OCPD). - Preprocessing: Preprocess text data, including lemmatization, stopword removal, and frequency-based filtering.
-
Functional Description:
StreamETM is an application designed for online topic modeling using the Embedded Topic Model (ETM). It processes streaming text data, merges topic models over time, and detects change points in topic distributions.
-
News of the Year:
The software has been developed as part of the paper, Merging Embedded Topics with Optimal Transport for Online Topic Modeling on Data Streams (https://arxiv.org/pdf/2504.07711)
- Publication:
-
Contact:
Federica Granese
-
Participant:
4 anonymous participants
6.1.3 ACTA
-
Name:
A Tool for Argumentative Clinical Trial Analysis
-
Keywords:
Artificial intelligence, Natural language processing, Argument mining
-
Functional Description:
Argumentative analysis of textual documents of various nature (e.g., persuasive essays, online discussion blogs, scientific articles) allows to detect the main argumentative components (i.e., premises and claims) present in the text and to predict whether these components are connected to each other by argumentative relations (e.g., support and attack), leading to the identification of (possibly complex) argumentative structures. Given the importance of argument-based decision making in medicine, ACTA is a tool for automating the argumentative analysis of clinical trials. The tool is designed to support doctors and clinicians in identifying the document(s) of interest about a certain disease, and in analyzing the main argumentative content and PICO elements.
-
Release Contributions:
In 2024, ACTA has been integrated in a suite called ANTIDOTE, which collects the software results from the ANTIDOTE project (all partners). In 2025, ACTA has been optimised to improve the mining and analysis of natural language arguments from clinical texts.
- URL:
-
Contact:
Serena Villata
6.1.4 PEACE
-
Name:
Providing Explanations and Analysis for Combating Hate Expressions
-
Keywords:
Hate Speech Detection, Generating Explanations, Implicit Hate Speech, Subtle Hate Speech
-
Functional Description:
PEACE is a web tool conceived to support content moderators in exploring and evaluating implicit and subtle hate speech on social media. It comprises three main functionalities: i) the exploratory analysis of hate speech messages characteristics (exploration), ii) the prediction of hatefulness (detection), and iii) the explanation of system predictions (explanation).
These functionalities incorporate not only a binary classification of whether a message is hateful (including explicit, implicit, and subtle messages), with a detailed explanation in natural language that clarifies why a message is considered hateful and an exploratory analysis of the message characteristics.
- URL:
-
Contact:
Elena Cabrio
6.1.5 DispuTool 2.0
-
Keywords:
Argument mining, NLP, Web API, Relation Extraction, Component Detection
-
Functional Description:
DISPUTool 2.0 is an automated tool which relies on Argument Mining methods to analyze the political debates from the US presidential campaigns to extract argument components (i.e., premise and claim) and relations (i.e., support and attack), and highlight fallacious arguments. DISPUTool 2.0 allows also for the automatic analysis of a piece of a debate proposed by the user to identify and classify the arguments contained in the text. A REST API is provided to exploit the tool's functionalities.
-
Release Contributions:
The new version of DispuTool of 2025 allows to "repair" a fallacious argument with its non-fallacious version.
- URL:
-
Contact:
Serena Villata
6.1.6 MARIANNE-SAFE
-
Name:
Structured Argumentation for Fact-checking with Explanations
-
Keywords:
Artificial intelligence, Disinformation detection
-
Functional Description:
- Generate argument-structured summaries based on a fact-checking article or a list of evidence. - Retrieve evidence and generate argument-structured summaries when the fact-checking article is not available. - Assign a truthfulness label to a news claim which is enriched with a summary.
-
Release Contributions:
First version published in 2025.
- URL:
-
Contact:
Serena Villata
6.2 Open data
MARIANNE team members also released some curated datasets (i.e., manually annotated linguistic resources) which are freely released for the scientific advancement of the research community:
- USElecDeb60To20 (annotation guidelines and data): an annotated dataset collected from the website of the Commission on Presidential Debates, including all the transcripts of the presidential debates from Kennedy and Nixon in 1960 until Trump and Biden in 2020. Dataset statistics: 29521 argument components (16087 claims and 13434 premises), 25012 relations (3723 attacks and 21289 supports), and 2745 fallacious arguments. USElecDeb60To20: https://github.com/pierpaologoffredo/ElecDeb60to20
- AbstRCT: an annotated dataset of randomized controlled trials retrieved from the MEDLINE database via PubMed search. Dataset statistics: 4073 argument components (2808 evidence, 1265 claims), and 2601 argument relations (2259 supports, 342 attacks). Topics: neoplasm, glaucoma, hepatitis, diabetes, hypertension. AbstRCT: https://gitlab.com/tomaye/abstrct
- DART: a dataset consisting of tweets annotated with argumentation, over the topics Brexit and Grexit. Restricted access as dataset developed in the context of an industrial collaboration.
- NoDE: a benchmark of natural language argumentation from heterogeneous online content. Dataset statistics: Debatepedia/ProCon dataset (260 pairs divided into 140 supports and 120 attacks), Twelve Angry Men dataset (80 pairs divided into 25 supports and 55 attacks), and Wikipedia dataset (452 pairs divided into 215 supports and 237 attacks). NoDE: http://www-sop.inria.fr/NoDE/
7 New results
In 2025, MARIANNE team members achieved the following results.
7.1 Argument Mining
In this section, we present the main new results of the team on the first research axis.
7.1.1 FALCON: A multi-label graph-based dataset for fallacy classification in the COVID-19 infodemic
Participants: Mariana Chaves, Elena Cabrio, Serena Villata.
Keywords: Argument mining, fallacy detection, disinformation
Fallacies are arguments that seem valid but contain logical flaws. During the COVID-19 pandemic, they played a role in spreading misinformation, causing confusion and eroding public trust in health measures. Therefore, there is a critical need for automated tools to identify fallacies in media, which can help mitigate harmful narratives in future health crises. We present two key contributions to address this task. First, we introduce FALCON 15, a multi-label, graph-based dataset containing COVID-19-related tweets. This dataset includes expert annotations for six fallacy types—loaded language, appeal to fear, appeal to ridicule, hasty generalization, ad hominem, and false dilemma—and allows for the detection of multiple fallacies in a single tweet. The dataset's graph structure enables analysis of the relationships between fallacies and their progression in conversations. Second, we evaluate the performance of language models on this dataset and propose a dual-transformer architecture that integrates engineered features. Beyond model ranking, we conduct statistical analyses to assess the impact of individual features on model performance.
7.1.2 AM4DSP: Argumentation Mining in Structured Decentralized Discussion Platforms for Deliberative Democracy
Participants: Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata.
Keywords: Argument mining, e-Democracy, Deliberation Platforms
Collaborations: Lucas Anastasiou (The Open University (UK)), Anna de Liddo (The Open University (UK))
Argument(ation) mining (AM) is the automated process of identification and extraction of argumentative structures in natural language. This field has seen rapid advancements, offering powerful tools to analyze and interpret complex and large discourse in diverse domains (political debates, medical reports, etc.). In this paper 20, we introduce an AM-boosted version of BCause, a large-scale deliberation platform. Figure 1 visualizes the overall achitecture of the platform. The system enables the extraction and analysis of arguments from online discussions in the context of deliberative democracy, which aims to enhance the understanding and accessibility of structured argumentation in large-scale deliberation processes.
The AM4DSP system performs argumentation analysis on structured discussions in BCause, a large scale deliberation platform.
demonstrates the AM4DSP system. AM4DSP performs argumentation analysis on structured discussions in BCause, a large scale deliberation platform. See Subsection 7.1.2 for more details.
7.1.3 RooseBERT: A New Deal For Political Language Modeling
Participants: Deborah Dore, Elena Cabrio, Serena Villata.
Keywords: Argument mining, language modeling, political debates, sentiment analysis, stance detection
The increasing amount of political debates and politics-related discussions calls for the definition of novel computational methods to automatically analyze such content with the final goal of lightening up political deliberation to citizens. However, the specificity of the political language and the argumentative form of these debates (employing hidden communication strategies and leveraging implicit arguments) make this task very challenging, even for current general-purpose pre-trained Language Models. To address this issue, we introduce a novel pre-trained Language Model for political discourse language called Roose-BERT. Pre-training a language model on a specialized domain presents different technical and linguistic challenges, requiring extensive computational resources and large-scale data. RooseBERT has been trained on large political debate and speech corpora (8K debates, each composed of several sub-debates on different topics) in English. To evaluate its performances, we fine-tuned it on four downstream tasks related to political debate analysis, i.e., stance detection, sentiment analysis, argument component detection and classification, and argument relation prediction and classification. Our results demonstrate significant improvements over general-purpose Language Models on these four tasks, highlighting how domain-specific pre-training enhances performance in political debate analysis. We release RooseBERT 34 for the research community.
7.1.4 Leveraging Graph Structural Knowledge to Improve Argument Relation Prediction in Political Debates
Participants: Deborah Dore, Serena Villata.
Keywords: Argument mining, political debates, knowledge graphs
Collaborations: Stefano Faralli (Sapienza University of Rome)
Argument Mining (AM) aims at detecting argumentation structures (i.e., premises and claims linked by attack and support relations) in text. A natural application domain is political debates, where uncovering the hidden dynamics of a politician's argumentation strategies can help the public to identify fallacious and propagandist arguments. Despite the few approaches proposed in the literature to apply AM to political debates, this application scenario is still challenging, and, more precisely, concerning the task of predicting the relation holding between two argument components. Most of AM relation prediction approaches only consider the textual content of the argument component to identify and classify the argumentative relation holding among them (i.e., support, attack), and they mostly ignore the structural knowledge that arises from the overall argumentation graph. In this paper 19, we propose to address the relation prediction task in AM by combining the structural knowledge provided by a Knowledge Graph Embedding Model with the contextual knowledge provided by a fine-tuned Language Model (see Figure 2). Our experimental setting is grounded on a standard AM benchmark of televised political debates of the US presidential campaigns from 1960 to 2020. Our extensive experimental setting demonstrates that integrating these two distinct forms of knowledge (i.e., the textual content of the argument component and the structural knowledge of the argumentation graph) leads to novel pathways that outperform existing approaches in the literature on this benchmark and enhance the accuracy of the predictions.
Architecture integrating Knowledge Graphs and Language Models for Argument Component Relation Prediction and Classification.
Architecture integrating Knowledge Graphs and Language Models for Argument Component Relation Prediction and Classification. For mode details, see Subsection 7.1.4.
7.1.5 Merging Embedded Topics with Optimal Transport for Online Topic Modeling on Data Streams
Participants: Federica Granese, Serena Villata.
Keywords: Online Topic Modeling, Optimal Transport, NLP
Collaborations: Benjamin Navet (Université Côte d'Azur), Charles Bouveyron (Université Côte d'Azur)
Online topic models are unsupervised algorithms to identify latent topics in data streams that continuously evolve over time. Although these methods naturally align with real-world scenarios, they have received considerably less attention from the community compared to their offline counterparts. In this work, we introduce a novel approach to online topic modeling named StreamETM. This approach builds on the Embedded Topic Model (ETM) to handle data streams by merging models learned on consecutive partial document batches using unbalanced optimal transport. Additionally, an online change point detection algorithm is employed to identify shifts in topics over time, allowing for the detection of significant changes in the dynamics of text streams 24.
7.1.6 Stick-Breaking Embedded Topic Model with Continuous Optimal Transport for Online Analysis of Document Streams
Participants: Federica Granese, Serena Villata.
Keywords: Online Topic Modeling, Optimal Transport, NLP
Collaborations: Charles Bouveyron (Université Côte d'Azur)
In this work, we propose the Stick-Breaking Stream Embedded Topic Model (SB-SETM), an improved version of the StreamETM model presented in 24, for the online topic modeling setting. SB-SETM (i) leverages a truncated stick-breaking construction for the topic–per-document distribution, enabling the model to automatically infer from the data the appropriate number of active topics at each timestep; and (ii) introduces a merging strategy for topic embeddings based on a continuous formulation of optimal transport adapted to the high dimensionality of the latent topic space. Numerical experiments show that SB-SETM outperforms baselines on simulated scenarios. We extensively test it on a real-world corpus of news articles covering the Russian–Ukrainian war from 2022 to 2023 30.
7.1.7 A Topicality-Driven QUD Model for Discourse Processing
Participants: Yingxue Fu, Anaïs Ollagnier.
Keywords: QUD, discourse modeling, discourse parsing, Rhetorical Structure Theory, Penn Discourse Treebank
Collaborations: Mark-Jan Nederhof (University of St Andrews)
Question Under Discussion (QUD) is a discourse framework that has attracted growing interest in NLP in recent years. Among existing QUD models, the QUD tree approach focuses on reconstructing QUDs and their hierarchical relationships, using a single tree to represent discourse structure. Prior implementation shows moderate inter-annotator agreement, highlighting the challenging nature of this task. In this paper, we propose a new QUD model for annotating hierarchical discourse structure 21. Our annotation achieves high inter-annotator agreement: 81.45% for short files and 79.53% for long files of Wall Street Journal articles. We show preliminary results on using GPT-4 for automatic annotation, which suggests that one of the best-performing LLMs still struggles with capturing hierarchical discourse structure. Moreover, we compare the annotations with RST annotations. Lastly, we present an approach for integrating hierarchical and local discourse relation annotations with the proposed model.
7.2 Argumentation quality assessment and reasoning
In this section, we present the main new results of the team on the second research axis.
7.2.1 Fast Computing of Dung Semantics in Acyclic Probabilistic Argumentation Frameworks
Participants: Victor David.
Keywords: Probabilistic Argumentation, Algorithm, Complexity
Collaborations: Stefano Bistarelli (University of Perugia), Pierre Monnin (Inria), Francesco Santini (University of Perugia), Carlo Taticchi (University of Perugia)
This paper published in 14 presents fast and exact methods for computing the probability of an argument’s acceptance using Dung’s semantics in the Constellation paradigm of Abstract Argumentation. For (directed) Singly-Connected Graphs (SCGs), the problem can now be solved in linearithmic time instead of being exponential in the number of attacks, as reported in the literature. Moreover, in the more general case of Directed Acyclic Graphs (DAGs), we provide an algorithm whose time complexity is linearithmic in the product of the out-degree of dependent arguments, i.e., arguments reaching the argument considered for acceptance through multiple paths in the graph. We theoretically show that this complexity is lower than the lower bound of the (exact) Constellation method, which is also supported by empirical results. Our approach to DAGs is also compared with the (approximate) Monte-Carlo method, which is stopped when exact results are obtained. Within this time constraint, Monte-Carlo still outputs significant errors, underlying the fast computation of our approach.
7.2.2 A Logic-based Framework for Decoding Enthymemes in Argument Maps involving Implicitness in Premises and Claims
Participants: Victor David.
Keywords: Enthymeme, Logical Argumentation, Argument mining
Collaborations: Anthony Hunter (UCL)
Argument mining is a natural language processing technology aimed at identifying the explicit premises and claims of arguments in text, and the support and attack relationships between them. To better understand and automatically analyze the argument maps that are output from argument mining, it would be desirable to instantiate the arguments in the argument map with logical arguments. However, most real-world arguments are enthymemes (i.e. some of the premises and/or claims are implicit), which need to be decoded (i.e. the implicit aspects need to be identified). A key challenge is to decode enthymemes so as to respect the support and attack relationships in the argument map. To address this, we present a novel framework 18, based on default logic, for representing arguments including enthymemes. We show how decoding an enthymeme means identifying the default rules that are implicit in the premises and claims. We then show how choosing a decoding of the enthymemes in an argument map can be formalized as an optimization problem, and that a solution can be obtained using MaxSAT solvers.
7.2.3 An Axiomatic Study of a Modular Evaluation of Enthymeme Decoding in Weighted Structured Argumentation
Participants: Victor David.
Keywords: Enthymeme, Logical Argumentation, Axiomatization
Collaborations: Jonathan Ben-Naim (CNRS), Anthony Hunter (UCL)
An argument can be seen as a pair of premises and a claim they support. Human arguments are often approximate, with some premises left implicit, leading to an implicit inference of the claim, i.e., forming enthymemes. To better understand and use them, we must decode these approximate enthymemes, typically by identifying missing premises to make the inference explicit, and, as we propose, by also removing irrelevant content to improve argument quality in specific contexts. Often, multiple decodings of an enthymeme are possible. However, no formal method has yet been proposed for identifying higher-quality decodings. To pave the way, we introduce 13 six types of criteria for evaluating aspects of decodings. Then, we introduce the concept of a criterion measure, designed to evaluate decodings based on a specific criterion. In parallel, we define desirable properties for criterion measures, referred to as axioms, and we systematically evaluate our criterion measures with respect to them. Finally, we introduce the notion of quality measure that combine specific criterion measures to give an overall evaluation of the quality of decodings.
7.2.4 Similarity Measures for First-Order Logical Arguments
Participants: Victor David.
Keywords: Similarity Measure, First Order Logic, Argumentation
Collaborations: Jérôme Delobelle (Université Paris Cité), Jean-Guy Mailly (Université de Toulouse)
Similarity in formal argumentation has recently gained attention due to its significance in problems such as argument aggregation in semantics and enthymeme decoding. While prior work has focused on propositional logic arguments, we extend these approaches to First-Order Logic (FOL) arguments, enabling reasoning based on the similarity of arguments in more complex and realistic contexts. We present a comprehensive framework 17 for FOL argument similarity, including: 1. An extended axiomatic foundation for similarity measures, 2. A parametric model decomposed into four levels to efficiently evaluate structured knowledge, 3. A Tversky-based family of measures to instantiate these concepts, 4. A set of constraints ensuring well-behaved models that satisfy axioms, and 5. We introduce and analyze non-symmetric similarity measures in formal argumentation for the first time.
7.2.5 DataLens: Enhancing Dataset Discovery via Network Topologies
Participants: Anaïs Ollagnier.
Keywords: Dataset search, Network visualization, Faceted search
Collaborations: Aline Menin (Université Côte d'Azur)
The rapid growth of publicly available textual resources, such as lexicons and domain-specific corpora, presents challenges in efficiently identifying relevant resources. While repositories are emerging, they often lack advanced search and exploration features. Most search methods rely on keyword queries and metadata filtering, which require prior knowledge and fail to reveal connections between resources. To address this, we present DataLens, a web-based platform that combines faceted search with advanced visualization techniques to enhance resource discovery. DataLens offers network-based visualizations, where the network structure can be adapted to suit the specific analysis task. It also supports a chained views approach, enabling users to explore data from multiple perspectives. A formative user study involving six data practitioners revealed that users highly value visualization tools—especially network-based exploration—and offered insights to help refine our approach to better support dataset search.
7.2.6 Mining Implicit Arguments for Reasoning : A Survey
Participants: Ekaterina Sviridova, Elena Cabrio, Serena Villata.
Keywords: Argument mining, Implicitness
Argumentation is the process of creating arguments for and against competing claims. Computational argumentation involves different ways of analyzing and reasoning upon arguments and their relations. More precisely, Argument Mining is the research field aiming at automatically identifying and classifying argument structures in text. The research field is mainly focused on the extraction of explicit argument structures (i.e., claims and premises connected by support and attack relations). However, an even more challenging task consists in extracting implicit argument structures in text (e.g., enthymemes). These structures are particularly valuable to then address argument reasoning, e.g., on incomplete and uncertain information, to finally compute the set of acceptable arguments, i.e., argument justification and skepticism. In this paper 11, we present and compare current approaches and available datasets for the novel task of Implicit Argument Mining. Future work perspectives are discussed to pave the way to further studies in this direction.
7.2.7 Before the Outrage: Challenges and Advances in Predicting Online Antisocial Behavior
Participants: Anaïs Ollagnier.
Keywords: Antisocial behavior prediction, Systematic literature review, Abusive behavior
Antisocial behavior (ASB) on social media—including hate speech, harassment, and trolling—poses escalating challenges for platform safety and societal well-being. While prior research has largely focused on detecting harmful content after its appearance, predictive approaches seek to anticipate harmful behaviors—such as hate speech propagation, conversation derailment, or user recidivism—before they fully unfold. Despite growing scholarly attention, the field of ASB prediction remains structurally uncoordinated and conceptually diffuse, lacking a unified taxonomy or integrative synthesis. This paper 36 addresses this gap through a systematic review of 49 machine learning studies on ASB prediction published between 2010 and 2025. It introduces a coherent framework that organizes existing work into five core task types: early harm detection, harm emergence prediction, harm propagation prediction, behavioral risk prediction, and proactive moderation support. The analysis compares tasks by temporal framing, prediction granularity, and operational goals, while tracing methodological trends from classical machine learning to pre-trained language models. By consolidating fragmented research and clarifying conceptual boundaries, this review establishes a structured foundation for future work. It identifies key challenges—including dataset scarcity, limited benchmarks, and temporal drift—and calls for ASB-specific evaluation protocols, multimodal and pragmatically informed representations, and explainable human-in-the-loop systems to advance predictive moderation.
7.2.8 Addressing Antisocial Behavior in Multi-Party Dialogs Through Multimodal Representation Learning
Participants: Hajar Bakarou, Mohamed Sinane El Messoussi, Anaïs Ollagnier.
Keywords: Social networks, antisocial behavior, multi-party dialogs, multimodal representation learning
Antisocial behavior (ASB) on social media, including hate speech, harassment, and cyberbullying, poses growing risks to platform safety and societal well-being. Prior research has focused largely on networks such as X and Reddit, while multi-party conversational settings remain underexplored due to limited data. To address this gap, we use CyberAgressionAdo-Large 32, a French open-access dataset simulating ASB in multi-party conversations, and evaluate three tasks: abuse detection, bullying behavior analysis, and bullying peer-group identification. We benchmark six text-based and eight graph-based representation-learning methods, analyzing lexical cues, interactional dynamics, and their multimodal fusion. Results show that multimodal models outperform unimodal baselines. The late fusion model mBERT + WD-SGCN achieves the best overall results, with top performance on abuse detection (0.718) and competitive scores on peer-group identification (0.286) and bullying analysis (0.606). Error analysis highlights its effectiveness in handling nuanced ASB phenomena such as implicit aggression, role transitions, and context-dependent hostility.
7.2.9 Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study
Participants: Greta Damo, Elena Cabrio, Serena Villata.
Keywords: Hate speech, counter speech, social media
The research presents a novel computational framework for evaluating the effectiveness of counter-speech (CS) in mitigating online hate speech 16. Grounded in linguistics, communication, and argumentation theories, the framework defines six core dimensions – Clarity, Evidence, Emotional Appeal, Rebuttal, Audience Adaptation, and Fairness – used to annotate 4,214 counter-speech instances from two benchmark datasets. A human-annotated resource was created and released to the community, enabling further research in this area. The study also proposes two classification strategies, multi-task and dependency-based models, which outperform standard baselines, achieving average F1 scores of 0.94 and 0.96, respectively, across expert- and user-written CS. Results highlight the interdependence of effectiveness dimensions and provide a structured approach for assessing counter-speech, with potential societal impact in reducing online hostility and supporting safer digital discourse.
7.2.10 `Detectors Lead, LLMs Follow`: Integrating LLMs and traditional models on implicit hate speech detection to generate faithful and plausible explanations
Participants: Nicolás Benjamín Ocampo, Greta Damo, Elena Cabrio, Serena Villata.
Keywords: Hate speech, explainability, Large Language Models
The journal paper addresses the challenge of detecting and explaining implicit hate speech (HS) on social media 9. It proposes a novel framework that enhances Large Language Models (LLMs) for both classification and natural language explanation of hateful content, particularly targeting nuanced and implicit instances often missed by traditional methods. The approach combines binary classification (HS vs. Non-HS) with generated explanations, and further investigates the impact of incorporating information from BERT-based models on detection performance. Comprehensive evaluation on widely used datasets, the HateCheck and Implicit Hate Corpus datasets, demonstrates that LLMs coupled with explanations outperform classical detectors, improving both accuracy and interpretability. The work provides insights into how predictive and explanatory systems can support automated moderation and enhance understanding of implicit hateful messages.
7.2.11 DISPUTool 3.0: Fallacy Detection and Repairing in Argumentative Political Debates
Participants: Deborah Dore, Elena Cabrio, Serena Villata.
Keywords: Argument mining, political debates, fallacy detection, fallacy repairing
Collaborations: Pierpaolo Goffredo (ALTEN)
This paper introduces and evaluates a novel web-based application designed to identify and repair fallacious arguments in political debates. DISPUTool 3.0 22 offers a comprehensive tool for argumentation analysis of political debate, integrating state-of-the-art natural language processing techniques to mine and classify argument components and relations (see Figure 3). DISPUTool 3.0 builds on the ElecDeb60to20 dataset, covering US presidential debates from 1960 to 2020. In this paper, we introduce a novel task which is integrated as a new module in DISPUTool, i.e., the automatic detection and classification of fallacious arguments, and the automatic repairing of such misleading arguments. The goal is to show to the user a tool which not only identifies fallacies in political debates, but also shows how the argument looks like once the veil of fallacy falls down. An extensive evaluation of the module is addressed employing both automated metrics and human assessments. With the inclusion of this module, DISPUTool 3.0 advances even more user critical thinking in front of the augmenting spread of such nefarious kind of content in political debates and beyond.
DispuTOOL 3.0's Architecture.
DispuTOOL 3.0's Architecture. For mode details, see Subsection 7.2.11.
7.2.12 Repairing Fallacious Argumentation in Political Debates
Participants: Deborah Dore, Elena Cabrio, Serena Villata.
Keywords: Argument mining, political debates, fallacy detection, fallacy repairing
Collaborations: Pierpaolo Goffredo (ALTEN)
Fallacious arguments are defined as “invalid” arguments (e.g., the conclusion does not follow from the premises) or wrong moves in argumentative discourse. This kind of argumentation is therefore misleading or deceptive, in particular when employed in political debates. As the spreading of this nefarious content severely impacts the society and the decision-making of both citizens and policymakers, it is vital to prevent fallacious and propagandist arguments to circulate. To address this challenging task, several approaches proposing to identify fallacious argumentation in text have been presented in the literature. However, merely identifying this content is insufficient to ensure the audience realizes the impact of the fallacious argument on its deliberation process and to support the development of critical thinking skills. To tackle this challenging goal, it is necessary to unveil why a particular argument is fallacious and to demonstrate how it could be repaired as a valid, non-fallacious argument. In this paper 23, we address this key challenge by proposing a new task called repairing fallacious argumentation. The goal of this task is to modify statements that contain fallacious arguments into versions that are clearer, fairer, and free from any technique that could negatively persuade listeners. We carry out this task on political debates, where the need for this kind of solution is urgent. Our contribution in addressing this task is manifold: i) a novel dataset, FallacyFix, comprising repaired examples across various fallacy categories (Appeal to Fear, Appeal to Pity, Appeal to Popular Opinion, Flag Waving, and Loaded Language) based on the ElecDeb60to20-fallacy dataset; ii) modular prompt techniques for generating non-fallacious arguments, both dependent and independent of the specific fallacy label being addressed. Through an extensive evaluation, we assess these techniques using the most widely used Large Language Models (in Zero-Shot, Few-Shot, and Fine-Tuning settings) and a standard baseline model (BART); iii) a rigorous evaluation framework to assess the accuracy of the generated non-fallacious argument repairing the fallacy in the original argument, with respect to the manually annotated benchmark of non-fallacious arguments we built from the ElecDeb60to20 dataset; iv) a human evaluation of the generated non-fallacious arguments to assess the acceptability of these arguments across three dimensions, i.e., Relevance, Suitability, and Cogency. Future research will focus on integrating domain-specific knowledge to address complex fallacy categories, further analyzing language models’ behavior in countering fallacies, and exploring real-time fallacy repair methodologies. These efforts aim to enhance our ability to address fallacies dynamically in various argumentation contexts, potentially improving the quality of public discourse and decision-making.
7.2.13 Contextualizing Toxicity: An Annotation Framework for Unveiling Pragmatics in Conversations of Online Discussion Forums
Participants: Yingxue Fu, Anaïs Ollagnier.
Keywords: pragmatics ; toxicity ; annotation ; Reddit conversations
The role of context has attracted increasing attention in research on toxicity detection. Interpreting toxic language remains a complex and multifaceted challenge, shaped by numerous linguistic, contextual, and social factors. However, current approaches often define “context” narrowly, focusing primarily on surface lexical cues such as hate lexicons, profanity markers, or sentiment polarity. These features, while useful, are insufficient to capture the interactional dynamics, user behaviors, and intentionality that shape such phenomena. To address this gap, this paper introduces a novel and systematic annotation framework 35, grounded in Speech Act Theory, aimed at deciphering the illocutionary and perlocutionary dimensions of conversation, which are unexplored in existing studies. We apply this framework to a new dataset of complete Reddit conversation threads, sampled to include discussions that turn toxic (124 conversations, 1990 messages). We evaluate the performance of GPT models (GPT-3, GPT-4, and GPT-5) on this challenging annotation task, providing insights into how large language models capture pragmatic and contextual dimensions of online toxicity.
7.3 Natural Language Argument Generation
In this section, we present the main new results of the team on the third research axis.
7.3.1 Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation
Participants: Greta Damo, Elena Cabrio, Serena Villata.
Keywords: Hate speech, counter speech, Retrieval-Augmented Generation
The study introduces a novel knowledge-grounded framework for automatic counter-speech (CS) generation 33. The framework leverages advanced Retrieval-Augmented Generation (RAG) pipelines to produce trustworthy and factual counter-speech for eight target groups commonly affected by hate speech, including women, people of color, persons with disabilities, migrants, Muslims, Jews, LGBT persons, and others. A knowledge base of 32,792 texts from the United Nations Digital Library, EUR-Lex, and the EU Agency for Fundamental Rights supports the generation process. Evaluation on the MultiTarget-CONAN dataset, using both automated and LLM-based metrics, and human assessments, demonstrates that the framework outperforms standard large language model baselines and competitive approaches. The resulting system and knowledge base provide a reusable resource for research on effective and reliable counter-speech generation.
7.3.2 Overview of the Critical Questions Generation Shared Task
Participants: Ekaterina Sviridova, Elena Cabrio, Serena Villata.
Collaborations: Blanca Calvo Figueras (University of the Basque Country UPV/EHU), Jaione Bengoetxea (University of the Basque Country UPV/EHU), Maite Heredia (University of the Basque Country UPV/EHU), Rodrigo Agerri (University of the Basque Country UPV/EHU)
The proliferation of AI technologies has reinforced the importance of developing critical thinking skills. One promising direction involves leveraging Large Language Models (LLMs) to support the generation of critical questions: inquiries aimed at identifying weaknesses or fallacies in argumentative texts. The Critical Questions Generation (CQs-Gen) shared task 29 offers the first benchmark for this task. Thirteen participating teams explored diverse strategies for generating such questions, with the highest-performing system achieving 67.6 % accuracy, highlighting the challenge and complexity of the task. Notably, three of the four top-ranked submissions used argumentation scheme annotations to enhance performance. While most systems were based on open-weight LLMs, the two leading teams employed proprietary models.
7.3.3 Argument generation for fact-checking
Participants: Xiaoou Wang, Elena Cabrio, Serena Villata.
Keywords: Argument generation, RAG, fact-checking, disinformation detection.
The widespread availability of the internet and social media platforms has fundamentally transformed how people consume and share information. This transformation has brought with it a significant challenge: online disinformation, i.e., namely, the intentional spread of false or misleading information. The speed and scale at which disinformation circulates present two major hurdles: (1) the development of automated fact-checking algorithms, and (2) the generation of explanations to accompany these systems. Explanations are crucial for several reasons: they help convince readers of the interpretation of evidence; they enable a feedback loop to correct judgment errors; and they reduce the risk of the “backfire effect,” where opaque predictions from black-box models actually strengthen belief in false claims. Currently, explanations are typically derived either from human-written fact-checking articles or from lists of evidence extracted from textual corpora. Meanwhile, the field of Argument Mining offers promising tools to enhance fact-checking. In the context of the ANR ATTENTION project, we enriched a subset of the LIAR-PLUS dataset with argumentative annotations, creating LIARArg 12 the first fake news classification dataset enhanced with argumentative layers. It includes 2,832 news titles with justifications, comprising 3,956 claims, 7,130 premises, and 8,205 argument relations. Moreover, we introduced a novel summarization method 27 that generates argument-structured explanations, significantly boosting the performance of top fact-checking algorithms across multiple datasets—including those with human-written fact-checks (LIAR-PLUS, FNC-1, Check-Covid) and those requiring automatic evidence retrieval (ExClaim). This retrieval-summarization framework, called SAFE 28, is also made available via a user-facing web interface. Our work bridges Argument Mining and automated fact-checking to create systems that are not only more effective, but also more transparent and persuasive. We showed that argumentative structures in evidence can improve both classification and explanation.
7.4 Other contributions
In this section, we summarize the new results obtained by team members on topics which are not related with the three main axes of MARIANNE.
7.4.1 On Estimating the Strength of Differentially Private Mechanisms in a Black-Box Setting
Participants: Federica Granese.
Keywords: Differential Privacy , Histogram-based Sampling, Impossibility of Provable Guarantees
Collaborations: Daniele Gorla (Sapienza University of Rome), Louis Jalouzot (CEA), Catuscia Palamidessi (Inria), Pablo Piantanida (CNRS)
We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism is a priori unknown to them (the so-called “black-box” scenario). In particular, we explore four notions of differential privacy. On the one hand, we prove that, without any assumptions on the underlying distributions, it is not possible to have an algorithm that can infer the level of data protection with provable guarantees. On the other hand, we demonstrate that, under reasonable assumptions (namely Lipschitzness of the involved densities on a closed interval), such guarantees exist for the local versions and can be achieved by a simple histogram-based estimator. We validate our results experimentally and note that, in two particularly well-behaved distributions (namely the Laplace and the Gaussian noise), our method performs better than expected, in the sense that in practice the number of samples needed to achieve the desired confidence is smaller than the theoretical bound, and the estimate of is more precise than predicted 10.
8 Partnerships and cooperations
In this section, we summarize the main cooperation activities carried out within MARIANNE in 2025.
8.1 International initiatives
Participants: Victor David, Elena Cabrio, Serena Villata.
8.1.1 Associate Teams in the framework of an Inria International Lab or in the framework of an Inria International Program
Victor David coordinated the activity of the Inria–UCL associated team EXPLAINER, which focuses on argumentation, natural language processing, logic, and neuro-symbolic approaches involving large language models. The team addresses the challenge of implicit arguments (enthymemes), which are pervasive in human communication but poorly handled by current argument mining and learning-based methods. By combining NLP with logical representations and automated reasoning, the goal is to enable principled analysis of argumentative structures, including consistency, validity, and relations between arguments. A central objective is the decoding and evaluation of enthymemes. Other team members working within the EXPLAINER team are Elena Cabrio and Serena Villata .
In 2025, the collaboration has already yielded strong outcomes, including two papers accepted at top-tier A* conferences. Through the EXPLAINER team, funding was secured for a two-year postdoctoral position starting in January 2026, dedicated to enthymeme decoding using LLMs and argumentation schemes. In parallel, additional projects enabled the recruitment of a six-month intern followed by a six-month research engineer position, with a view toward a PhD on the persuasiveness of enthymemes. Further support was obtained through a one-year postdoctoral position funded by the IDEX AMI-Idées call of Université Côte d’Azur, focused on evaluating enthymeme decoding quality.
8.1.2 Participation in other International Programs
Serena Villata co-led with Jaime Sichman (USP, Brasil) the UNBIAS Team (argUmentation and Norm Based Intelligent AgentS) within the IRC CNRS-USP (UNBIAS website). The scientific goal of the UNBIAS team is to incorporate AI argumentation and normative agents techniques to provide better interactions in social-technical systems.
8.2 International initiatives
Participants: Federica Granese.
8.2.1 Visits of international scientists
The collaboration with UCL within the EXPLAINER associated team was also strengthened by a one-month visit from Anthony Hunter in June-July 2025 (funded by the I3S lab), and by the submission of a JCJC ANR proposal directly linked to the EXPLAINER project.
8.2.2 Visits to international teams
Federica Granese participated in the first edition of the IVADO–Inria Initiative: Franco–Quebec Exchange Semester on Artificial Intelligence, a joint program designed to strengthen collaboration between AI research communities in France and Québec. The program provided financial support for a five-week research stay (mid-November to mid-December 2025) at the CNRS International Laboratory on Learning Systems (ILLS) and Mila – Quebec AI Institute (Montréal, Québec, Canada). The research project addressed emerging safety risks in multi-agent AI systems, with a particular emphasis on modeling manipulative interactions between AI agents. The project involved Federica Granese and Serena Villata (MARIANNE, Inria-I3S) and Pablo Piantanida (ILLS, Mila).
8.3 European initiatives
Participants: Elena Cabrio, Sofiane Elguendouze, Erwan Hain, Serena Villata.
8.3.1 Horizon Europe
The team participated to one European project in 2025.
- ORBIS project (2022-2025) ORBIS addresses the disconnects between ambitious ideas and collective actions at a large socio-technical scale. It responds to the profound lack of dialogue between citizenship and policy making institutions by providing a theoretically sound and highly pragmatic socio-technical solution to enable the transition to a more inclusive, transparent and trustful Deliberative Democracy in Europe. The project shapes and supports new democratic models that are developed through deliberative democracy processes; it follows a socio-constructive approach in which deliberative democracy is not a theory which prescribes new democratic practices and models, but rather the process through which we can collectively imagine and realize them. ORBIS provides new ways to understand and facilitate the emergence of new participatory democracy models, together with the mechanisms to scale them up and consolidate them at institutional level. It delivers: (i) a sound methodology for deliberative participation and co-creation at scale; (ii) novel AI-enhanced tools for deliberative participation across diverse settings; (iii) a novel socio-technical approach that augments the articulation between deliberative processes and representative institutions in liberal democracies; (iv) new evidence-based democratic models that emerge from the application of citizen deliberation processes; (v) demonstrated measurable impact of such innovations in real-world settings. The project builds on cutting-edge AI tools and technologies to develop a sustainable digital solution to e-deliberation. In this project, the MARIANNE team contributes with the development of argumentation-based solutions to analyze in a transparent manner the discussions and enhance decision making. The MARIANNE members involved in the project are: Sofiane Elguendouze , Erwan Hain , Elena Cabrio and Serena Villata . Link to the project webpage
8.4 National initiatives
Participants: Elena Cabrio, Mariana Chaves, Sofiane Elguendouze, Serena Villata, Xiaoou Wang.
The team participated to two ANR projects in 2025, ensuring the coordination of one of them.
- ANR PRCE, Call 2021, "Artificial Intelligence" – Generating Counter Arguments to Fight Disinformation on the Web (ATTENTION) – (January 2022 - June 2026), total budget 522k€ – principal investigator: Serena Villata. The national research industrial project PRCE ATTENTION started on January 2022. The project includes the following partners, in addition to the CNRS - Laboratoire I3S (coordinator): CNRS - Centre Maurice Halbwachs, EURECOM, Université Paris 1 Sorbonne, the French start-up Buster.Ai. Objectives of the project: online disinformation is not a new phenomenon, but it has taken on an unprecedented scale, particularly during the acute health crisis. Social media limit the virality of disinformation mainly through content moderation. However, identifying disinformation and reporting its status is not enough to counter it. The ATTENTION project addresses this issue by designing intelligent ways to identify disinformation online and generate counter-arguments to fight the spread of such information online. The idea is to avoid the undesired effects of content moderation, e.g., overblocking, and to directly intervene in the discussion by engaging with people spreading incorrect information, through textual arguments that counter the fake content, and prevent it from spreading. A multidisciplinary perspective is adopted to ensure as a result AI solutions compliant with the ethical and sociological challenges of online disinformation. The PhD thesis of Xiaoou Wang was funded on this project. Xiaoou defended his PhD thesis on December 2025. The MARIANNE members involved in the project are: Xiaoou Wang , Elena Cabrio and Serena Villata . Link to the project webpage
- ANR ASTRID, Call 2022, "Artificial Intelligence" – Controversy and influence in the Ukraine war: a study of argumentation and counter-argumentation through Artificial Intelligence (CIGAIA) – (December 2022 - October 2026). Serena Villata is the co-principal investigator of the national research project ANR CIGAIA, which started on December 2022. The project includes the following partners, in addition to the CNRS - Laboratoire I3S: CREA Centre de Recherche de l'Ecole de l'Air (coordinator). The proposed research is organized into two Axes. To understand the impact of controversies on armed conflict, the first axis studies the controversies in English and French taking place in the conflict between Russia and Ukraine. The second axis focuses on designing and implementing an Artificial Intelligence algorithm for the automatic analysis of argumentation in these controversies. This work is based on the joint creation of an annotated dataset from an initial mapping, allowing automatic argumentation analysis to identify and classify controversies. Building on the practical field of the conflict between Russia and Ukraine, this project addresses to the creation of a decision support tool, capable of mapping the actors through the identification of controversies and their characterizations by means of argumentation and counter-argumentative strategies. The MARIANNE members involved in the project are: Elena Cabrio and Serena Villata . Link to the project webpage
9 Dissemination
In this section, we summarize the main dissemination and popularization activities carried out by MARIANNE's members in 2025.
9.1 Promoting scientific activities
9.1.1 Scientific events: organization
General chair, scientific chair
- Elena Cabrio was part of the Scientific Committee. Soph.IA Summit, 2025.
- Serena Villata was co-chair (together with Mauro Vallati) of the 14th Conference on Prestigious Applications of Intelligent Systems (PAIS-2025) which held in close association with the 28th European Conference of Artificial Intelligence (ECAI-2025) in Bologna during October 25-30, 2025.
Member of the organizing committees
- Victor David was a member of the organizing committee of the Workshop on Argumentation and Online Debates (ANR Project AGGREEY).
- Serena Villata co-organized the "Hybrid Argumentation and Responsible AI" seminar at Leibnitz Center (The Netherlands) on March 31 - April 4, 2025. This workshop proposed Hybrid Argumentation as an approach to responsible interaction between humans and AI systems. Link to the workshop webpage
Member of scientific boards and steering committees
- Elena Cabrio is an elected member of the Board of the scholarly association for Natural Language Processing (ATALA).
- Serena Villata is vice-president of the Steering Committee of COMMA (Computational Models of Arguments).
- Serena Villata is member of the Steering Committee of ECA (European Conference on Argumentation).
- Serena Villata is member of the IAAIL (International Association for Artificial Intelligence and Law) Executive Committee.
9.1.2 Scientific events: selection
Chair of conference program committees
- Serena Villata was co-chair (together with Mauro Vallati) of the program committee of the 14th Conference on Prestigious Applications of Intelligent Systems (PAIS-2025).
Member of the conference program committees
-
Elena Cabrio
and Serena Villata
have been members of the following program committees as:
- Senior action editors, ACL Rolling Review (ARR)
- Senior Program Chairs of: IJCAI (International Conference in Artificial Intelligence), AAAI (Conference of the Association for the Advancement of Artificial Intelligence), ECAI (European Conference in Artificial Intelligence)
- Elena Cabrio has also been Senior Area Chair for COLING 2025, track "Sentiment Analysis, Opinion and Argument Mining".
- Victor David was Senior Area Chair for KR-2025.
- Federica Granese was part of the program committee for Prestigious Applications of Intelligent Systems (PAIS 2025) in association with the 28th European Conference of Artificial Intelligence (ECAI-2025). Bologna, Italy, October 25-30, 2025.
- Serena Villata has also been Program Committee member for KR-2025 and SAC-2025.
Reviewer
- Victor David was reviewer for AAAI-2025 and ECAI-2025.
-
Federica Granese
was reviewer for:
- ACM TheWebConf 2026 (Industry Track). Dubai, United Arab Emirates, April 13-17, 2026.
- International Conference on Artificial Intelligence and Statistics (AISTATS 2026). Tangier, Morocco, May 2nd – May 5th, 2026.
- International Conference on Learning Representations (ICLR 2026). Rio de Janeiro, Brazil, April 23-27, 2026.
- ACM Knowledge Discovery and Data Mining, Applied Data Science (ADS) Track Papers (KDD 2026, I/II Cycle). Jeju, Korea, August 9-13, 2026.
9.1.3 Journal
Member of the editorial boards
-
Elena Cabrio
is part of the editorial boards of:
- Italian Journal of Computational Linguistics (IJCoL ISSN 2499-4553)
- French journal Traitement Automatique des Langues (TAL).
-
Serena Villata
is part of the editorial boards of:
- Artificial Intelligence and Law
- Argument and Computation
- Journal of Web Semantics
Reviewer - reviewing activities
-
Elena Cabrio
was:
- Appointed Reviewer of the Computational Linguistics journal (two-year term, ending in June 2026).
- Appointed Reviewer of the Transactions of the Association for Computational Linguistics journal (two-year term, ending in June 2026).
- Reviewer for the AI and Law journal, Argument and Computation journal, Journal on Multimodal User Interfaces
-
Victor David
was reviewer for:
- Journal Argumentation & Computation
- EAAI: Engineering Applications of Artificial Intelligence
-
Federica Granese
was reviewer for:
- IEEE Transactions on Dependable and Secure Computing.
- IEEE Transactions on Information Forensics and Security.
-
Anais Ollagnier
was reviewer for:
- Journal of Social Computing
- Information Processing & Management
- AI & Law Journal
9.1.4 Invited talks
Federica Granese delivered the following invited talks:
- When Pragmatic AI Agents Misalign: Safety Risks in Multi-Agent AI Systems – A Focus on Deceptive Behavior. Inria–IVADO Closing Event, December 2–3, 2025, Montréal, Québec, Canada. Franco–Canadian Dialogue on AI. Round table on AI and Safety, November 28, 2025, Mila – Quebec AI Institute, Montréal, Québec, Canada; English.
- When Pragmatic AI Agents Misalign: Safety Risks in Multi-Agent AI Systems – A Focus on Deceptive Behavior. Invited seminar, International Laboratory on Learning Systems (ILLS), November 27, 2025, Montréal, Québec, Canada.
Serena Villata delivered the following invited talks:
- Generative AI for fighting online abusive content: results and open challenges, Séminaire IA Génératives: Promesses et Défis, CNRS, March 12th, 2025.
- Argumentation for medicine: extracting arguments and generating explanations for a better human understanding, Inria Meetup NLP Santé, April 8th, 2025.
- Intelligence Artificielle : questions techniques et d’éthique, Journée d’étude sur « l’impact de l’IA sur la pratique de l’enseignement universitaire »Académie Royale de Belgique, Bruxelles, September 29th, 2025.
- L‘IA contre les fake news et le cyberharcèlement, Allianz Conference, Paris, October 16th, 2025.
- Generative AI: technical and ethical challenges in game applications, First Workshop on Artificial Intelligence for Human-Game Interaction, co-located with the 28th European Conference of Artificial Intelligence (ECAI ?25) Bologna, October 25th-30th, 2025.
9.1.5 Leadership within the scientific community
- Elena Cabrio is a chairholder of the 3IA Institute, project title: “Advanced Natural Language Understanding” (Chair renewed in 2025).
- Anais Ollagnier is member of the Academy 1 Board: Networks, Information, Digital Society (term 2025–2026).
- Anais Ollagnier is contributor to the European EMAI4EU project, as part of the EIT Digital Master School, a European master's-level training program in deep-tech technologies, combining academic excellence, entrepreneurship, and innovation, and involving a network of university partners across Europe.
- Serena Villata is a chairholder of the 3IA Institute, project title: “Artificial Argumentation for Humans”.
- Since June 2025, Serena Villata is nominated member of the new Conseil de l'intelligence artificielle et du numerique CIANum. The Conseil is an independent instance reporting to the Minister responsible for artificial intelligence and digital technology. Its mission is to study all issues relating to the development of digital technology and artificial intelligence, as well as their impact on society, the economy, and local areas.
- Till June 2025, Serena Villata has been member of the scientific committee of the Direction Generale des Finances Publiques. The committee had the goal to discuss with the Direction Generale des Finances Publiques about their strategy and open issues about digital transformation and data science.
9.1.6 Scientific expertise
Elena Cabrio has been:
- Reviewer for the ERC Advanced Grant 2024 Call.
- Reviewer for Research Foundation Flanders (FWO).
- Reviewer for the scientific project call from COMUE Université de Toulouse.
- Expert tutor for the AI Projects of the Fondation de recherche pour l'aeronautique et l'espace and IRT ANTOINE SAINT EXUPERY.
- Member of the advisory committee of the MultiPOD projet (Multilingual and Multicultural Spaces for Political Deliberation), funded by the EU.
- Member of the AI Advisory Council of the company ENGIE Energy.
- Member of the Scientific Committee of ISTEX.fr (CNRS), preparing the platform's application as a national research infrastructure.
Anais Ollagnier has been member of the selection committee for:
- the scientific projects AAP 2025 – DIM AI4IDF
- the ATALA thesis prize, 2025
Serena Villata has been member of:
- the recruitment committee for the Inria Paris CRCN-ISFP positions, 2025 (7 applications to review, participation to the jury meetings, oral interviews of the pre-selected candidates, deliberation).
- the HCRES committee for the LTCI Laboratory, 2025 (3 teams to evaluate, participation to the committee meetings, oral interviews of the teams, report writing).
9.1.7 Research administration
- Elena Cabrio is Appointed Member of the Academic Board of Université Côte d’Azur.
- Victor David managed the Inria–UCL associated team.
- Anais Ollagnier obtained and managed research and training funding, more precisely, the Call for Expressions of Interest in "Digital Transformation" for the Introduction to Deep Learning course, and a six-month internship grant (AAP MIDI 02) on the DataLens project (a web-based platform that integrates faceted search with advanced information visualization techniques to facilitate the search and exploration of machine learning (ML) datasets).
- Serena Villata has been Deputy Scientific Director of the Interdisciplinary Institute of Artificial Intelligence (3IA Côte d'Azur) till October 2025. The 3IA Côte d'Azur Institute was one of the four Interdisciplinary Institutes of Artificial Intelligence labeled in April 2019 by the Ministry of Higher Education, Research and Innovation, and it is now one of the nine French National Centers of Excellence in AI labeled "IA Cluster" (since May 2024). Since November 2025, Serena Villata took the responsibility of Scientific Director of the Institute.
- Federica Granese has been the organizer and coordinator of MARIANNE Journal Club (Exploring LLMs from the ground up, covering seminal papers, key advancements, and their limitations)1.
9.2 Teaching - Supervision - Juries - Educational and pedagogical outreach
The team is also highly involved in PhD supervision, academic teaching activities and participation to research juries. This section details the involvement of each team members in such activities.
9.2.1 Teaching
- Elena Cabrio delivered the following teaching hours: Natural Language Processing, Master 1 Computer Science. 30 hours (eq. TD); Natural Language Processing, Master 2 MIAGE - IA2: Applied Artificial Intelligence. 8 hours; Introduction to Computational Linguistics, Master 1 EUR CREATES, Parcours Linguistique, traitements informatiques du texte et processus cognitifs. 30 hours; Introduction to Python, Master 1 EUR CREATES, Parcours Linguistique, traitements informatiques du texte et processus cognitifs. 15 hours; Natural Language Processing, DUT IA et santé. 2 hours; Natural Language Processing, Lic. 3 Sciences et Technologies: Intelligence Artificielle. 11 hours; Introduction to AI. Licence 2. IUT. 22 hours; Introduction to DataBases. Licence 1. IUT. 90 hours; Web interfaces. Licence 1. IUT. 23.5 hours.
- Serena Villata delivered the following teaching hours: Master II Droit de la Création et du Numérique - Sorbonne University: Approche de l'Elaboration et du Fonctionnement des Logiciels, 15 hours (CM), 30 students; Master 2 MIAGE IA - University Côte d'Azur: I.A. et Langage : Traitement automatique du langage naturel, 28 hours (CM+TP), 30 students; Master 2 Algorithmic law and data governance - University Côte d'Azur: Introduction to AI, 12 hours (CM), 20 students; Master NeuroMod - University Côte d'Azur: Text Analysis, Deep Learning and Statistics, 15 hours (CM), 10 students.
- Anais Ollagnier delivered the following teaching hours: EUR DS4H, Enseignement mutualisé (S2 et S1), M1 Droit des affaires, UE Artificial Intelligence: Introduction to Deep Learning (KMUIAIUI) (CM : 12,5 h ; TD : 16 h) x2; EFELIA, M1 Anthropologie des arts vivants (S1), UE Artificial Intelligence and Societal Transformation (BMUIST1), CM : 12 h; EUR DS4H – Enseignement mutualisé, M1 Informatique – parcours informatique (S2), UE Individual Project S2 (EN) (KMUMP22U), TD : 2 h; Portail Sciences et Technologies – Enseignement mutualisé, PO1 Sciences, ingénierie, technologie, environnement (S2), UE INFO : Système 1 (SPUF201), TP : 60 h; EIT Digital Master School / EMAI4EU, Conception et dispensation de cours en Traitement Automatique des Langues. Cours : Introduction to Natural Language Processing (enseignement en ligne, public international, niveau master).
- Victor David delivered the following teaching hours: University Tutor for Internships and Apprenticeships, Université Côte d’Azur & Polytech Nice Sophia; Fundamentals of AI, Practical Sessions (18 hours), DUT2, IUT Nice Côte d’Azur (x 3 groups).
- Sofiane Elguendouze delivered the following teaching hours: Advanced NLP course - M2 Computer Science (27h): NLP basics, deep neural networks & backpropagation, recurrent neural networks, sequence to sequence models and attention, transformers, LLMs and prompt engineering; Introduction to AI - M1 applied foreign languages (15h): definition, demystification, introduction to machine learning, introduction to rule-based/statistical and neural NLP, introduction to language modeling; Applied AI - M2 applied foreign languages (15h): reminder about AI and machine learning, reminder about NLP, introduction to language models and prompting; Introduction to AI - M2 humanities and social sciences (15h): introduction to AI, introduction to machine learning, introduction to deep learning, introduction to NLP, language modeling and language models, ethical and societal limits and challenges of AI methods; Introduction to AI - M2 archaeology (12h): introduction to deep learning and computer vision, applied computer vision techniques to archaeological learning problems.
- Greta Damo : In 2025, as part of her teaching duty, Greta helped evaluating the group project presentations for the course I.A. et langage at the M2 Intelligence Artificielle Appliquée program (Semester 2 of the 2024/25 academic year), which focused on natural language processing topics and techniques, dedicating 5 hours to this activity. Additionally, she taught both theoretical and practical sessions for the Data Analysis and Visualization course at CentraleDigitalLab, an AI postgraduate program at Centrale Méditerranée, during the first semester of the 2025/26 academic year, totaling 22 hours.
- Ekaterina Sviridova : Ekaterina delivered 12 hours of TD (Travaux dirigés) for Master 2 students in Traitement automatique des textes (EUR CREATES), in the course HMEDAC3 – ECUE Annotation de corpus. The course introduced the role of annotated data in NLP, presented different types of annotation tasks and tools, and included hands-on annotation sessions. It also covered post-processing and preprocessing of annotated data for language models, followed by a practical project in which students implemented the simple pipeline from raw data annotation to model training to address a specific application task.
- Deborah Dore delivered the following teaching hours: Data Analysis and Visualization (22H), Centrale Méditerranée, Theory and laboratory sessions for students from different backgrounds. The course focused on how data visualization can provide new insights and support data driven analysis; Traitement du Langage Naturel (24H), Université Côte d’Azur, MIAGE. Laboratory courses focused on exploring the field of natural language processing, from text preprocessing to the creation of AI models; Traitement du Langage Naturel (14H), Université Côte d’Azur, L3 Informatique. Laboratory courses focused on exploring the field of natural language processing, from text preprocessing to the creation of AI models.
9.2.2 Supervision
- Elena Cabrio and Anais Ollagnier co-supervised the doctoral thesis of Thi Thao Ha , “Toward Culturally and Contextually Grounded Language Models”.
- Elena Cabrio and Serena Villata co-supervised the PhD thesis of Greta Damo (counter-narratives against hate speech), Deborah Dore (political argumentation evolving in time), Ekaterina Sviridova (implicitness in argumentation), Cyprien Michel-Deletie (bridging natural language argumentation with formal reasoning), Benjamin Ocampo (implicit and subtle hate speech detection), Xiaoou Wang (argumentation-based fact checking).
- Victor David supervised the two Master’s level internships (5 and 6 months) of Loup Doinel and Nino Pireaud . Victor also supervised the activity of the research engineer Theo Alkibiades Collias in the context of the AGGREY ANR project.
- Anais Ollagnier supervised the scientific activity of post-doctoral research Yingxue Fu on “Improving argument mining through contextual information synthesis”.
- Anais Ollagnier also supervised the two Master's level internships of Hajar Bakarou and Mohamed Sinane El Messoussi .
- Serena Villata supervised Sofiane Elguedouze as post-doc on the CIGAIA and then the ORBIS project, on argument mining for online deliberation and dispute resolution, Mariana Chavez as engineer on the AI4MEDIA and then CIGAIA project, working on fallacy identification in online social media.
9.2.3 Juries
Elena Cabrio took part to the following juries:
- President of the jury for a tenure track position at Pompeu Fabra Unversity, Spain.
- Jury member for a tenure track position at Fondazione Bruno Kessler, Italy.
- Reviewer of the PhD thesis of Arij Riabi, Sorbonne Université.
- Reviewer of the PhD thesis of Amir Reza Jafari, Institut-Mines Télécom, Télécom SudParis.
- Reviewer of the PhD thesis of Andrea Zaninello, Bolzen University, Italy.
- Examiner of the PhD thesis of Virgile Rennard, LIX, École Polytechnique.
Serena Villata took part to the following juries:
- Marta Marchiori Manerba, University of Pisa. Title of the thesis: "Fairness Auditing, Explanation and Debiasing in Linguistic Data and Language Models". Role: reviewer. PhD defense: 2025.
- Mustapha Bounoua, EURECOM. Title of the thesis: "Harnessing Multimodality: Diffusion based Generative Modeling and Information Estimation". Role: reviewer. PhD defense: 2025.
- Youri Peskine, EURECOM. Title of the thesis: "Using Knowledge Graph To Detect And Explain Misinformation Spread On The Web". Role: reviewer. PhD defense: 2025.
9.3 Popularization
As the research carried out in MARIANNE is highly interdisciplinary and that the targeted application scenarios tackle societal challenges, the team members have been solicited to participate to different popularization events and productions. This section details the contributions of the team on this matter.
9.3.1 Productions (articles, videos, podcasts, serious games, ...)
Serena Villata has been interviewed by the newspaper Epsiloon, in the article "La fin d’Internet" by Pierre-Yves Bocquet, March 26, 2025. Link to the article.
9.3.2 Participation in Live events
-
Elena Cabrio
took part in the following live events:
- Panelist in the roundtable “The metamorphosis of democracy” at the World AI Cannes Festival (WAICF), 2025.
- Festival des Sciences de Nice, 2025.
- Federica Granese took part to the live event "Attractivité de la France dans le domaine de la recherche en IA" (3 minutes interview, link to the online event).
-
Serena Villata
participated to the following popularization events:
- Round table at the Conférence Intelligence Artificielle de l'Association OBJECTIF SCIENCE: "L’humanité à l’heure de l’IA : Entre Maîtrise & Dépendance (social, démographie, santé, éthique, démocratie, liberté, informations et communications...)", Isle sur la Sorgue, April 4th, 2025.
- Round table at the Printemps des technologies 2025, March 2025, Centre de Congres Ville Saint Raphael, "Faut-il tout accepter de la techno ?".
10 Scientific production
10.1 Major publications
- 1 inproceedingsAn Axiomatic Study of a Modular Evaluation of Enthymeme Decodingin Weighted Structured Argumentation.Proceedings of the 22nd International Conference on Principles of Knowledge Representation and Reasoning22nd International Conference on Principles of Knowledge Representation and ReasoningMelbourne, Australia2025HAL
- 2 inproceedingsFast Computing of Dung Semantics in Acyclic Probabilistic Argumentation Frameworks.The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)Thirty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2025Philadelphia, United StatesFebruary 2025HAL
- 3 inproceedingsA Logic-based Framework for Decoding Enthymemes in Argument Maps Involving Implicitness in Premises and Claims.IJCAI 2025 - Thirty-Fourth International Joint Conference on Artificial IntelligenceMontreal, CanadaInternational Joint Conferences on Artificial Intelligence OrganizationAugust 2025, 4445-4453HALDOI
- 4 inproceedingsAM4DSP: Argumentation Mining in Structured Decentralized Discussion Platforms for Deliberative Democracy.Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System DemonstrationsSuzhou, ChinaAssociation for Computational LinguisticsNovember 2025, 796-805HALDOI
- 5 inproceedingsA Topicality-Driven QUD Model for Discourse Processing.ACL AnthologySIGDIAL 2025 - The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue2025.sigdial-1.0Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and DialogueAvignon, FranceAugust 2025, 214–230HAL
- 6 inproceedingsDISPUTool 3.0: Fallacy Detection and Repairing in Argumentative Political Debates.ACL Anthology63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)3Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)Vienne, AustriaAssociation for Computational LinguisticsJuly 2025, 472-480HALDOI
- 7 inproceedingsMerging Embedded Topics with Optimal Transport for Online Topic Modeling on Data Streams.ECML PKDD 2025 - Joint European Conference on Machine Learning and Knowledge Discovery in Databases16019Lecture Notes in Computer SciencePorto, PortugalSpringer Nature SwitzerlandSeptember 2025, 290-307HALDOI
- 8 inproceedingsSAFE: Structured Argumentation for Fact-checking with Explanations.Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25)34th International Joint Conference on Artificial Intelligence (IJCAI-25)Montreal, CanadaInternational Joint Conferences on Artificial IntelligenceAugust 2025, 11114-11118HALDOI
10.2 Publications of the year
International journals
International peer-reviewed conferences
Conferences without proceedings
Doctoral dissertations and habilitation theses
Reports & preprints