Section: New Results
Algorithmic foundations
Keywords: computational geometry, Computational topology, Voronoi diagrams,
The Structural Bioinformatics Library: modeling in biomolecular science and beyond
Participants : Frédéric Cazals, Tom Dreyfus.
Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library (SBL, http://sbl. inria.fr) [20], a generic C++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances.
The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These tools can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C++ toolbox with modular design, involving (2) core algorithms, (3) biophysical models, and (4) modules, the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback.
The SBL is available from http://sbl.inria.fr.
Optimal transportation problems with connectivity constraints
Participants : Frédéric Cazals, Dorian Mazauric.
The earth mover distance (EMD) or the Mallows distance are example optimal transportation (OT) problems reducing to linear programs. In this work [21], we study a generalization of these problems when the supply and demand nodes are the vertices of two graphs called the supply and the demand graphs. The novel problems embed connectivity constraints in the transport plans computed, using a Lipschitz-like condition involving distances between certain subgraphs of the supply graph and certain subgraphs of the demand graph. More precisely, we make three contributions.
First, we formally introduce two optimal transportation problems
generalizing EMD, namely Minimum-cost under flow, transport size, and
connectivity constraints problem (problem EMD-FCC) and Maximum-flow
under cost, transport size, and connectivity constraints problem
(problem EMD-CCC). We prove that problems EMD-CCC and EMD-FCC are
NP-complete, and that EMD-FCC is hard to approximate within any given
constant. Second, we develop a greedy heuristic algorithm returning
admissible solutions, of time complexity
Clustering stability revealed by matchings between clusters of clusters
Participants : Frédéric Cazals, Dorian Mazauric, Romain Tetley.
Clustering is a fundamental problem in data science, yet, the variety of clustering methods and their sensitivity to parameters make clustering hard. To analyze the stability of a given clustering algorithm while varying its parameters, and to compare clusters yielded by different algorithms, several comparison schemes based on matchings, information theory and various indices (Rand, Jaccard) have been developed. In this work [22], we go beyond these by providing a novel class of methods computing meta-clusters within each clustering– a meta-cluster is a group of clusters, together with a matching between these. Altogether, these pieces of information help assessing the coherence between two clusterings.
More specifically, let the intersection graph of two clusterings be the edge-weighted bipartite graph in which the nodes represent the clusters, the edges represent the non empty intersection between two clusters, and the weight of an edge is the number of common items. We introduce the so-called (k,D) and D-family-matching problems on intersection graphs, with k the number of meta-clusters and D the upper-bound on the diameter of the graph induced by the clusters of any meta-cluster. First we prove hardness and inapproximability results. Second, we design exact polynomial time dynamic programming algorithms for some classes of graphs (in particular trees). Then, we prove efficient (exact, approximation, and heuristic) algorithms, based on spanning trees, for general graphs. Practically, we present extensive experiments in two directions. First, we illustrate the ability of our algorithms to identify relevant meta-clusters between a given clustering and an edited version of it. Second, we show how our methods can be used to identify notorious instabilities of the k-means algorithm.
Experimental evaluation of a branch and bound algorithm for computing pathwidth
Participant : Dorian Mazauric.
In collaboration with David Coudert and Nicolas Nisse (COATI project-team, Université Côte D'Azur, Inria, I3S / CNRS).
Path-decompositions of graphs are an important ingredient of dynamic
programming algorithms for solving efficiently many NP-hard problems.
Therefore, computing the pathwidth and associated path-decomposition of graphs
has both a theoretical and practical interest.
In [16], we design a Branch and Bound algorithm that computes the exact
pathwidth of graphs and a corresponding path-decomposition. Our main
contribution consists of several non-trivial techniques to reduce the size of
the input graph (pre-processing) and to cut the exploration space during the
search phase of the algorithm.
We evaluate experimentally our algorithm by comparing it to existing
algorithms of the literature. It appears from the simulations that our
algorithm offers a significant gain with respect to previous work. In
particular, it is able to compute the exact pathwidth of any graph with less
than 60 nodes in a reasonable running-time (
Extracting the core structural connectivity network: guaranteeing network connectedness through a graph-theoretical approach
Participant : Dorian Mazauric.
In collaboration with Demian Wassermann, Guillermo Gallardo-Diez and Rachid Deriche (ATHENA project-team, Université Côte d'Azur, Inria).
In this work [19], we present a graph-theoretical algorithm to extract the connected core structural connectivity network of a subject population. Extracting this core common network across subjects is a main problem in current neuroscience. Such network facilitates cognitive and clinical analyses by reducing the number of connections that need to be explored. Furthermore, insights into the human brain structure can be gained by comparing core networks of different populations. We show that our novel algorithm has theoretical and practical advantages. First, contrary to the current approach our algorithm guarantees that the extracted core subnetwork is connected agreeing with current evidence that the core structural network is tightly connected. Second, our algorithm shows enhanced performance when used as feature selection approach for connectivity analysis on populations.
On the complexity of the representation of simplicial complexes by trees
Participant : Dorian Mazauric.
In collaboration with Jean-Daniel Boissonnat (DataShape team, Université Côte d'Azur, Inria).
In this paper [14], we investigate the problem of the representation of simplicial complexes by trees. We introduce and analyze local and global tree representations. We prove that the global tree representation is more efficient in terms of time complexity for searching a given simplex and we show that the local tree representation is more efficient in terms of size of the structure. The simplicial complexes are modeled by hypergraphs. We then prove that the associated combinatorial optimization problems are very difficult to solve and to approximate even if the set of maximal simplices induces a planar graph of maximum degree at most three or a bounded degree hypergraph. However, we prove polynomial time algorithms that compute constant factor approximations and optimal solutions for some classes of instances.
Well balanced designs for data placement
Participant : Dorian Mazauric.
In collaboration with Jean-Claude Bermond (COATI project-team, Université Côte D'Azur, Inria, I3S / CNRS), Alain Jean-Marie (MAESTRO project-team, Université Côte D'Azur, Inria) and Joseph Yu (Department of Mathematics, UFV, Abbotsford, BC, Canada).
The problem we consider in [13] is motivated by
data placement, in particular data replication in distributed storage
and retrieval systems. We are given a set