EN FR
EN FR


Section: Partnerships and Cooperations

National Initiatives

ANR

HEREDIA

Participant : Jean-Sébastien Sereni [contact person] .

HEREDIA (http://www.liafa.univ-paris-diderot.fr/~sereni/Heredia/ ) is an ANR JCJC (“Jeunes Chercheurs”) focusing on hereditary properties of graphs, which provide a general perspective to study graph properties. Several important general theorems are known and the approach offers an elegant way of unifying notions and proof techniques. Further, hereditary classes of graphs play a central role in graph theory. Besides their theoretical appeal, they are also particularly relevant from an algorithmic point of view. With Jean-Sébastien Sereni, the HEREDIA project involves Pierre Charbit (LIAFA, Paris), Louis Esperet (G-SCOP, Grenoble) and Nicolas Trotignon (LIP, Lyon).

Hybride

Participants : Adrien Coulet, Luis-Felipe Melo, Amedeo Napoli, Matthieu Osmuk, Chedy Raïssi, My Thao Tang, Mohsen Sayed, Yannick Toussaint [contact person] .

The Hybride research project (http://hybride.loria.fr/ ) aims at combining Natural Language Processing (NLP) and Knowledge Discovery in Databases (KDD) for text mining. A key idea is to design an interacting and convergent process where NLP methods are used for guiding text mining and KDD methods are used for guiding the analysis of textual documents. NLP methods are mainly based on text analysis and extraction of general and temporal information. KDD methods are based on pattern mining, e.g. patterns and sequences, formal concept analysis and graph mining. In this way, NLP methods applied to texts extract “textual information” that can be used by KDD methods as constraints for focusing the mining of textual data. By contrast, KDD methods extract patterns and sequences to be used for guiding information extraction from texts and text analysis. Experimental and validation parts associated with the Hybride project are provided by an application to the documentation of rare diseases in the context of Orphanet.

The partners of the Hybride consortium are the GREYC Caen laboratory (pattern mining, NLP, text mining), the MoDyCo Paris laboratory (NLP, linguistics), the INSERM Paris laboratory (Orphanet, ontology design), and the Orpailleur team at Inria NGE (FCA, knowledge representation, pattern mining, text mining).

ISTEX

Participants : Luis-Felipe Melo, Amedeo Napoli, Yannick Toussaint [contact person] .

ISTEX is a so-called “Initiative d'excellence” managed by CNRS and DIST (“Direction de l'Information Scientifique et Technique”). ISTEX aims at giving to the research and teaching community an on-line access to scientific publications in all the domains. Thus ISTEX is in concern with a massive acquisition of documentation such as journals, proceedings, corpus, databases... ISTEX-R is one research project within ISTEX in which the Orpailleur team is involved, with two other partners, namely the ATILF laboratory and the INIST Institute (both in Nancy). ISTEX-R aims at developing new tools for querying full-text documentation, analyzing content and extracting information. A platform is currently under development to provide robust NLP tools for text processing, as well as methods in text mining and domain conceptualization.

Termith

Participants : Luis-Felipe Melo, Yannick Toussaint [contact person] .

Termith (http://www.atilf.fr/ressources/termith/ ) is an ANR Project which involves the following laboratories: ATILF, LIDILEM, LINA, INIST, Inria Saclay and Inria Nancy Grand Est. It aims at indexing documents belonging to different domain of Humanities. Thus, the project focuses on extracting candidate terms (information extraction) and on disambiguation.

In the Orpailleur team, we are mainly concerned by information extraction using Formal Concept Analysis techniques, but also pattern and sequence mining. The objective is to define “contexts introducing terms”, i.e. finding textual environments allowing a system to decide whether a textual element is actually a candidate term and its corresponding environment.

FUI PoQemon

Participants : Matthieu Osmuk, Chedy Raïssi [Contact Person] , Mickaël Zehren.

The PoQemon project aims at developing new pattern mining methods and tools for supporting privacy preserving knowledge discovery from monitoring purposes on mobile phone networks. The main idea is to develop sound approaches that handle the trade-off between privacy of data and the power of analysis. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this project, we integrated the ρ-uncertainty privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversary’s knowledge and without falsifying data. The project integrates the ρ-uncertainty pattern mining approach with novel data visualization techniques.

The PoQemon research project involves the following partners: Altran, DataPublica, GenyMobile, HEC, IP-Label, Next Interactive Media, Orange and Université Paris-Est Créteil, along with Inria Nancy Grand Est.

PEPS

PEPS Approppre

Participants : Mehwish Alam, Quentin Brabant, Aleksey Buzmakov, Victor Codocedo, Miguel Couceiro [Contact Person] , Adrien Coulet, Esther Galbrun, Amedeo Napoli, Chedy Raïssi, Yannick Toussaint.

This PEPS Approppre research project (see http://www.cnrs.fr/ins2i/spip.php?article1183 ) is aimed at setting a framework for characterizing the mining of preferences in massive data. Such a unified framework for the mining of qualitative preferences is not yet existing and can be related to recent studies in decision theory (aggregation models and consensus), machine learning and data mining. A particular focus will be done on the aggregation model of Sugeno integral which can be applied on a symbolic representation of preferences for two main operations, reduction of dimensionality (feature selection) and prediction.

PEPS Confocal

Participants : Adrien Coulet, Amedeo Napoli, Chedy Raïssi, Malika Smaïl-Tabbone.

The Confocal Project (see http://www.cnrs.fr/ins2i/spip.php?article1183 ) is interested in the design of new methods in bioinformatics for analyzing and classifying heterogeneous omics data w.r.t. biological domain knowledge. We are planning to adapt FCA and pattern structures for discovering patterns and associations in gene data with the help of domain ontologies. One important objective of the project is to check whether such a line of research could be reused on so-called discrete models in molecular biology.

PEPS Prefute

Participants : Mehwish Alam, Quentin Brabant, Aleksey Buzmakov, Victor Codocedo, Adrien Coulet, Miguel Couceiro [Contact Person] , Esther Galbrun, Amedeo Napoli, Chedy Raïssi, Mohsen Sayed, Malika Smaïl-Tabbone, My Thao Tang, Yannick Toussaint.

The PEPS Prefute project is mainly interested in interaction and iteration in the knowledge discovery (KD) process. Usually the KD process is organized around three main steps which are (i) selection and preparation of the data, (ii) data mining, and (iii) interpretation of (selected) resulting patterns. For leading such a process, which actually is a loop, an analyst who is most of the time an expert of the data domain, is present. This materializes the fact that the KD process requires interaction and iteration. However, it appears that until recently the most important progress were made on the second step of the KD process, i.e. data mining, and especially form the algorithmic point of view. This gave birth to a variety of efficient and fast algorithms. This second step is in between the two other steps whose importance is now becoming very clear as the analyst is facing very large amounts of data and even larger amounts of resulting patterns. Actually, KDDK is one possible way of tackling such a problem as the principle is to push domain knowledge for improving the KD process.

Accordingly, the PEPS Prefute project is interested in the study of interactions between the analyst and the KD process, i.e. pushing constraints, preferences and domain knowledge, for guiding and improving the KD process. One possible way is to discover some original and generic pattern which can be considered as a reference for going farther and to search the pattern space w.r.t. this original pattern linked to some preferences of the analyst. In this way, the interesting pattern space is much more concise and of much lower size. Moreover, the PEPS Prefute project contributes also to consolidate the place of the analyst in the KD process. In particular this means that more studies have to be carried out on the possible interactions with the analyst and on the importance of preferences and domain knowledge in this interaction. In addition, visualization tools associated to KD systems have to be improved for being able to work with the actual large amounts of data and patterns as well (see https://www.greyc.fr/fr/node/2207 ).