Section:
Research Program
Multivariate decompositions
Multivariate decompositions provide a way to model complex
data such as brain activation images: for instance, one might be
interested in extracting an atlas of brain regions from a given
dataset, such as regions exhibiting similar activity during a
protocol, across multiple protocols, or even in the absence of
protocol (during resting-state).
These data can often be factorized
into spatial-temporal components, and thus can be estimated through
regularized Principal Components Analysis (PCA) algorithms,
which share some common steps with regularized regression.
Let be a neuroimaging dataset written as an matrix, after proper centering; the model reads
where represents a set of spatial maps, hence a matrix
of shape , and the associated subject-wise
loadings.
While traditional PCA and independent components analysis are limited
to reconstructing components within the space spanned by
the column of , it seems desirable to add some constraints
on the rows of , that represent spatial maps, such as
sparsity, and/or smoothness, as it makes the interpretation of these
maps clearer in the context of neuroimaging.
This yields the following estimation problem:
where represents the columns of
. can be chosen such as in Eq. (2) in order to
enforce smoothness and/or sparsity constraints.
The problem is not jointly convex in all the variables but each
penalization given in Eq (2) yields a convex problem on
for fixed, and conversely.
This readily suggests an alternate optimization scheme, where
and are estimated in turn, until convergence
to a local optimum of the criterion.
As in PCA, the extracted
components can be ranked according to the amount of fitted variance.
Importantly, also, estimated PCA models can be interpreted as a
probabilistic model of the data, assuming a high-dimensional Gaussian
distribution (probabilistic PCA).
Utlimately, the main limitations to these algorithms is the cost due
to the memory requirements: holding datasets with large dimension and
large number of samples (as in recent neuroimaging cohorts) leads to
inefficient computation. To solve this issue, online method are
particularly attractive.