Section: Scientific Foundations
Modeling Interfaces and Contacts
Problems addressed. The Protein Data Bank, http://www.rcsb.org/pdb , contains the structural data which have been resolved experimentally. Most of the entries of the PDB feature isolated proteins (For structures resolved by crystallography, the PDB contains the asymmetric unit of the crystal. Determining the biological unit from the asymmetric unit is a problem in itself.), the remaining ones being protein - protein or protein - drug complexes. These structures feature what Nature does —up to the bias imposed by the experimental conditions inherent to structure elucidation, and are of special interest to investigate non-covalent contacts in biological complexes. More precisely, given two proteins defining a complex, interface atoms are defined as the atoms of one protein interacting with atoms of the second one. Understanding the structure of interfaces is central to understand biological complexes and thus the function of biological molecules [48] . Yet, in spite of almost three decades of investigations, the basic principles guiding the formation of interfaces and accounting for its stability are unknown [51] . Current investigations follow two routes. From the experimental perspective [33] , directed mutagenesis enables one to quantify the energetic importance of residues, important residues being termed hot residues. Such studies recently evidenced the modular architecture of interfaces [45] . From the modeling perspective, the main issue consists of guessing the hot residues from sequence and/or structural informations [40] .
The description of interfaces is also of special interest to improve scoring functions. By scoring function, two things are meant: either a function which assigns to a complex a quantity homogeneous to a free energy change (The Gibbs free energy of a system is defined by , with . is minimum at an equilibrium, and differences in drive chemical reactions.), or a function stating that a complex is more stable than another one, in which case the value returned is a score and not an energy. Borrowing to statistical mechanics [25] , the usual way to design scoring functions is to mimic the so-called potentials of mean force. To put it briefly, one reverts Boltzmann's law, that is, denoting the probability of two atoms –defining type – to be located at distance , the (free) energy assigned to the pair is computed as . Estimating from the PDB one function for each type of pair of atoms, the energy of a complex is computed as the sum of the energies of the pairs located within a distance threshold [49] , [36] . To compare the energy thus obtained to a reference state, one may compute , with the observed frequencies, and the frequencies stemming from an a priori model [41] . In doing so, the energy defined is nothing but the Kullback-Leibler divergence between the distributions and .
Methodological developments. Describing interfaces poses problems in two settings: static and dynamic.
In the static setting, one seeks the minimalist geometric model providing a relevant bio-physical signal. A first step in doing so consists of identifying interface atoms, so as to relate the geometry and the bio-chemistry at the interface level [10] . To elaborate at the atomic level, one seeks a structural alphabet encoding the spatial structure of proteins. At the side-chain and backbone level, an example of such alphabet is that of [26] . At the atomic level and in spite of recent observations on the local structure of the neighborhood of a given atom [50] , no such alphabet is known. Specific important local conformations are known, though. One of them is the so-called dehydron structure, which is an under-desolvated hydrogen bond —a property that can be directly inferred from the spatial configuration of the carbons surrounding a hydrogen bond [32] .
A structural alphabet at the atomic level may be seen as an alphabet featuring for an atom of a given type all the conformations this atom may engage into, depending on its neighbors. One way to tackle this problem consists of extending the notions of molecular surfaces used so far, so as to encode multi-body relations between an atom and its neighbors [8] . In order to derive such alphabets, the following two strategies are obvious. On one hand, one may use an encoding of neighborhoods based on geometric constructions such as Voronoi diagrams (affine or curved) or arrangements of balls. On the other hand, one may resort to clustering strategies in higher dimensional spaces, as the neighbors of a given atom are represented by degrees of freedom —the neighborhood being invariant upon rigid motions.
In the dynamic setting, one wishes to understand whether selected (hot) residues exhibit specific dynamic properties, so as to serve as anchors in a binding process [44] . More generally, any significant observation raised in the static setting deserves investigations in the dynamic setting, so as to assess its stability. Such questions are also related to the problem of correlated motions, which we discuss next.