CHAPTER 13. LINEAR FACTOR MODELS
phrase. In this book, a generative model either represents
p
(
x
) or can draw samples
from it. Many variants of ICA only know how to transform between
x
and
h
but
do not have any way of representing
p
(
h
), and thus do not impose a distribution
over
p
(
x
). For example, many ICA variants aim to increase the sample kurtosis of
h
=
W
−1
x
, because high kurtosis indicates that
p
(
h
) is non-Gaussian, but this is
accomplished without explicitly representing
p
(
h
). This is because ICA is more
often used as an analysis tool for separating signals, rather than for generating
data or estimating its density.
Just as PCA can be generalized to the nonlinear autoencoders described in
chapter 14, ICA can be generalized to a nonlinear generative model, in which
we use a nonlinear function
f
to generate the observed data. See Hyvärinen and
Pajunen (1999) for the initial work on nonlinear ICA and its successful use with
ensemble learning by Roberts and Everson (2001) and Lappalainen et al. (2000).
Another nonlinear extension of ICA is the approach of
nonlinear independent
components estimation
, or NICE (Dinh et al., 2014), which stacks a series of
invertible transformations (encoder stages) with the property that the determinant
of the Jacobian of each transformation can be computed eﬃciently. This makes
it possible to compute the likelihood exactly, and like ICA, NICE attempts to
transform the data into a space where it has a factorized marginal distribution, but
it is more likely to succeed thanks to the nonlinear encoder. Because the encoder
is associated with a decoder that is its perfect inverse, generating samples from
the model is straightforward (by ﬁrst sampling from
p
(
h
) and then applying the
decoder).
Another generalization of ICA is to learn groups of features, with statistical
dependence allowed within a group but discouraged between groups (Hyvärinen
and Hoyer, 1999; Hyvärinen et al., 2001b). When the groups of related units are
chosen to be nonoverlapping, this is called
independent subspace analysis
. It is
also possible to assign spatial coordinates to each hidden unit and form overlapping
groups of spatially neighboring units. This encourages nearby units to learn similar
features. When applied to natural images, this
topographic ICA
approach learns
Gabor ﬁlters, such that neighboring features have similar orientation, location or
frequency. Many diﬀerent phase oﬀsets of similar Gabor functions occur within
each region, so that pooling over small regions yields translation invariance.
13.3 Slow Feature Analysis
Slow feature analysis
(SFA) is a linear factor model that uses information from
time signals to learn invariant features (Wiskott and Sejnowski, 2002).
489