Indrajit Bhattacharya
IBM Research India
Title: Deep Nonparametric Admixture Models
Abstract: Deep generative models provide a hierarchical probabilistic
representation of data points, for example documents as mixtures of layered
entities, with topics, which are distributions over words at the first
layer, and then authors, which are distributions over topics, at the next
layer. Such a representation is called an admixture model (or a mixed
membership model) since multiple topics and multiple authors are used to
represent a single document. We consider deep variants of such models which
are allowed to grow to arbitrary (but finite) number of levels. Since
over-fitting is always a concern for deep models, we investigate
nonparametric models, where the number of parameters at each layer is not
fixed apriori, but is allowed to grow with data size. Dirichlet Processes
(DPs) are the basic building blocks of nonparametric modeling. While
Dirichlet Process mixture models enable infinite mixture modeling, infinite
admixture models result from Hierarchical Dirichlet Processes (HDPs), where
a draw from one DP becomes the base distribution for a second DP. In this
work, we investigate how HDPs can be coupled together to create deep
nonparametric admixture models. We show that this can be done by nesting
Hierarchical Dirichlet Processes, where each layer has its own HDP, and
each HDP has as its base distribution the HDP of the previous layer. We
show that such nested HDPs arise naturally as the infinite limit of
existing multi-layer parametric models such as the Author-Topic Model.
Inference is naturally a concern for such deep nonparametric models. We
study extensions of two different Gibbs sampling algorithms that exist for
the HDP - the direct sampling scheme, which directly samples the entities
at each layer for each word, and the Chinese Restaurant Franchise (CRF)
scheme which samples entities via pointers (called tables). While both
schemes are known to mix well for the HDP, we show that the complexity of
the CRF scheme grows exponentially with layers, and is not practical to use
beyond a single layer. We show using experiments on real text corpora that
the direct sampling scheme, whose complexity grows linearly with layers, is
a more practical alternative.
(Joint work with Lavanya Tekumalla (IISc) and Priyanka Agrawal (IBM Research))
Recommended Prior Reading:
- Dirichlet Process, Yee Whye Teh, Enclycopedia of Machine Learning, 2010, [link to pdf]
- Hierarchical Dirichlet Processes, Teh, Jordan, Beal, Blei, JASA, 2006, [link to pdf]