MISTIS - 2016 - Annual activity report

MISTIS

MISTIS - 2016

Project-Team Mistis

Members

Overall Objectives

Research Program

Application Domains

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: Research Program

Mixture models

Participants : Alexis Arnaud, Jean-Baptiste Durand, Florence Forbes, Aina Frau Pascual, Alessandro Chiancone, Stephane Girard, Julyan Arbel, Gildas Mazo, Jean-Michel Becu.

Key-words: mixture of distributions, EM algorithm, missing data, conditional independence, statistical pattern recognition, clustering, unsupervised and partially supervised learning.

In a first approach, we consider statistical parametric models, $θ$ being the parameter, possibly multi-dimensional, usually unknown and to be estimated. We consider cases where the data naturally divides into observed data $y = {y_{1}, ..., y_{n}}$ and unobserved or missing data $z = {z_{1}, ..., z_{n}}$ . The missing data $z_{i}$ represents for instance the memberships of one of a set of $K$ alternative categories. The distribution of an observed $y_{i}$ can be written as a finite mixture of distributions,

\begin{matrix} f (y_{i}; θ) = \sum_{k = 1}^{K} P (z_{i} = k; θ) f (y_{i} ∣ z_{i}; θ) . \end{matrix}

(1)

These models are interesting in that they may point out hidden variables responsible for most of the observed variability and so that the observed variables are conditionally independent. Their estimation is often difficult due to the missing data. The Expectation-Maximization (EM) algorithm is a general and now standard approach to maximization of the likelihood in missing data problems. It provides parameter estimation but also values for missing data.

Mixture models correspond to independent $z_{i}$ 's. They have been increasingly used in statistical pattern recognition. They enable a formal (model-based) approach to (unsupervised) clustering.

Previous |

Home | Next next