2nd Indian Workshop on Machine Learning
July 1-3, 2016, IIT Kanpur

Talk Abstracts and Details

This page displays the details of the talks going to be presented at the workshop.
Please click on a card for more details on that speaker's talk, or just browse down for the details of all talks.

Important Note: some speakers have suggested prior reading material for their talks. If you are attending the workshop, then to gain maximum benefit from the talks, please go through the suggested material before attending.

Rahul Agrawal
Microsoft R&D
India
Indrajit
Bhattacharya
IBM Research
India
Soma Biswas
Indian Institute
of Science
Amit Chandak
Amazon
Anirban
Dasgupta
IIT Gandhinagar
Ambedkar
Dukkipati
Indian Institute
of Science
Aditya Gopalan
Indian Institute
of Science
Prateek Jain
Microsoft Research
India
Shivaram
Kalyanakrishnan
IIT Bombay
Animesh
Mukherjee
IIT Kharagpur
Vaibhav Rajan
Xerox Research
Centre India
Ganesh
Ramakrishnan
IIT Bombay
Vikas Raykar
IBM Research
Shirish Shevade
Indian Institute
of Science
Parag Singla
IIT Delhi
Partha Talukdar
Indian Institute
of Science

Details of the talks

Rahul Agrawal
Microsoft R&D India

Title: Deep Learning + NLP: Introduction and insights from Advertising data

Abstract: In recent times, Deep Learning has made significant advances in modeling natural language understanding. In this talk, we will introduce natural language modeling using deep learning. We will start with the discussion of various word embedding, language models and application of deep learning to conventional NLP tasks such as NER. We will also take a look at experimental insights of the modeling on real world data such as search queries, web pages, ad descriptions, product catalogs etc.
Indrajit Bhattacharya
IBM Research India

Title: Deep Nonparametric Admixture Models

Abstract: Deep generative models provide a hierarchical probabilistic representation of data points, for example documents as mixtures of layered entities, with topics, which are distributions over words at the first layer, and then authors, which are distributions over topics, at the next layer. Such a representation is called an admixture model (or a mixed membership model) since multiple topics and multiple authors are used to represent a single document. We consider deep variants of such models which are allowed to grow to arbitrary (but finite) number of levels. Since over-fitting is always a concern for deep models, we investigate nonparametric models, where the number of parameters at each layer is not fixed apriori, but is allowed to grow with data size. Dirichlet Processes (DPs) are the basic building blocks of nonparametric modeling. While Dirichlet Process mixture models enable infinite mixture modeling, infinite admixture models result from Hierarchical Dirichlet Processes (HDPs), where a draw from one DP becomes the base distribution for a second DP. In this work, we investigate how HDPs can be coupled together to create deep nonparametric admixture models. We show that this can be done by nesting Hierarchical Dirichlet Processes, where each layer has its own HDP, and each HDP has as its base distribution the HDP of the previous layer. We show that such nested HDPs arise naturally as the infinite limit of existing multi-layer parametric models such as the Author-Topic Model. Inference is naturally a concern for such deep nonparametric models. We study extensions of two different Gibbs sampling algorithms that exist for the HDP - the direct sampling scheme, which directly samples the entities at each layer for each word, and the Chinese Restaurant Franchise (CRF) scheme which samples entities via pointers (called tables). While both schemes are known to mix well for the HDP, we show that the complexity of the CRF scheme grows exponentially with layers, and is not practical to use beyond a single layer. We show using experiments on real text corpora that the direct sampling scheme, whose complexity grows linearly with layers, is a more practical alternative.

(Joint work with Lavanya Tekumalla (IISc) and Priyanka Agrawal (IBM Research))

Recommended Prior Reading:
  1. Dirichlet Process, Yee Whye Teh, Enclycopedia of Machine Learning, 2010, [link to pdf]
  2. Hierarchical Dirichlet Processes, Teh, Jordan, Beal, Blei, JASA, 2006, [link to pdf]
Soma Biswas
Indian Institute of Science

Title: Machine Learning for Face Recognition

Abstract: Face recognition is a very important area of research in computer vision with lots of applications in surveillance. Unconstrained pose, illumination, expression, low-resolution, etc. makes this task extremely challenging. In this talk, we will discuss two successful approaches for this task, namely distance metric learning and dictionary learning. We will discuss some of the classical approaches in these two areas and also the work that we are currently doing in our lab. Bio: Soma Biswas is an assistant professor in the Electrical Engineering Department, Indian Institute of Science, Bangalore, India. She received the MTech degree from the Indian Institute of Technology, Kanpur and PhD degree in Electrical and Computer Engineering from University of Maryland, College Park in 2004 and 2009. Her research interests include image processing, computer vision, and pattern recognition.
Amit Chandak
Amazon

Title: Size Recommendations for Apparels & Shoes

Abstract: Online shopping is becoming the preferred mode of purchasing goods since it offers wide selection, competitive prices and easy returns. Apparel and shoes are important product categories with significant revenue impact. Many customers avoid purchasing them online due to fit issues. Sizing variations across brands and locales confuses customers when picking the right size for a desired product. When not sure about their correct size, customers tend to buy multiple sizes of a product and return the rest. This is a recurring pain point for Amazon customers which causes high return rates and bad customer experience. We address this pain point by predicting the right size of a product for a given user. We estimate true sizes of customers and products by minimizing an ordinal loss function defined on their purchase history and also leveraging a wide range of signals such as the collective user purchase/return history, sizing information in seller catalog, specified user and product attributes (e.g., feet size, waist length, product material). Our proposed algorithm is scalable and can be easily parallelized. We compare our results with an existing in-house developed algorithm which generates recommendations by clustering products and customers based on co-purchases.
Anirban Dasgupta
Indian Institute of Technology Gandhinagar

Title: Estimating Network Parameters using Sampling

Abstract: Large networks are ubiquitous, and yet, estimating the key parameters of large networks is not an easy problem due to various constraints--- either because of their size, or because of the fact that they are not available in their entirety and can only be accessed through some restricted interface (e.g. a rate-limited API). In this talk, we outline sampling based algorithms to estimate a couple of key network properties e.g. the average degree and the size of a sub-population. We will also discuss strategies to come up with bounds on the number of queries made to the API, and how different sampling strategies fare in this metric.

Joint work(s) with Ravi Kumar, Tamas Sarlos, Flavio Chiericetti and Silvio Lattanzi.

Recommended Prior Reading:
  1. The talk should be accessible to anyone with a basic graduate level knowledge in probability and graph theory.
Ambedkar Dukkipati
Indian Institute of Science

Title: Spectral methods for clustering and graph partitioning problems

Abstract: Spectral methods for clustering and graph partitioning problems have been very successful for two reasons: (i) good practical solutions and easy implementations, (ii) theoretical guarantees. In this talk I will introduce these methods by giving details of underlying spectral graph theory results. I will also talk about theoretical guarantees that one can achieve by using stochastic block models. Towards the end of the talk I will provide a brief overview of our recent results on hypergraph partitioning problems and their applications to computer vision and machine learning.

Recommended Prior Reading:
  1. Only basic linear algebra is assumed.
Aditya Gopalan
Indian Institute of Science

Title: Online Learning with Structure

Abstract: Online learning or sequential decision-making studies how an agent can learn to perform a task (optimize a metric of interest) through repeated interaction and observation (trial and error). An increasing number of modern-day automated systems are tasked with learning to make sequential decisions, e.g., Internet recommendation engines, dynamic pricing algorithms, automated trading systems, etc., by observing a variety of feedback signals, e.g., mouse clicks, demand responses, ratings, etc. A widely employed model for decision-making under uncertainty is the Multi-Armed Bandit, nearly eight decades old. Here, a decision maker repeatedly faces a choice of playing one of several “arms” or actions, each with an a priori unknown payoff, and must learn to play arms for maximum net reward. Bandit problems with large action spaces are challenging to solve in general, but the presence of additional structure often makes them significantly easier to tackle. Starting from the basic multi-armed bandit, we will explore several variations of online learning problems with rich, complex structures, and highlight efficient algorithmic paradigms to solve them. We will also attempt to cover recent results from our investigations into the robustness structure of linearly parameterized bandits and an online low-rank matrix completion bandit problem.

Prateek Jain
Microsoft Research India

Title: TBD

Abstract: TBD

Shivaram Kalyanakrishnan
Indian Institute of Technology Bombay

Title: Deep Reinforcement Learning

Abstract: If an agent can sense, think, and act in an unknown environment, how MUST it act? Reinforcement Learning (RL) is a general paradigm under which an agent, by trial and error, can discover actions that result in long-term gain. Integrating ideas from various disciplines, RL has matured as a field of study in the last few decades. Its successes range from training programs to decrease the waiting time for elevators in a building; to maximising profits from stock-trading; and to numerous tasks in robotic control and decision-making. In spite of these successes, RL is yet to be counted upon as a practically-viable technology. To a large extent, the gap between the promise of RL and its practical effectiveness has arisen from the lack of suitable representations (and representation-discovery mechanisms) in domains of interest. Interestingly, it is precisely this gap that the emerging paradigm of deep learning is beginning to fill. Deep learning is a data-driven technique that is capable of learning complex non-linear input-output relationships, which are represented using neural networks with a large number of layers (hence "deep"). The combination of RL with deep learning has registered remarkable successes in recent months. Notably, it has resulted in AI-based agents that exceed human-level performance on a suite of ATARI console games, and also on the more challenging game of Go. I will begin this talk with an overview of RL and its key algorithmic elements. Specifically, I will outline the role of representations in determining the success of RL. I will follow with a brief survey of deep learning, and proceed to describe the architecture of the agents that have recently succeeded at ATARI and Go. I will end the talk with thoughts about this new beginning for "deep reinforcement learning".
Animesh Mukherjee
Indian Institute of Technology Kharagpur

Title: In-depth analysis of large-scale citation networks

Abstract: In this talk I shall present an overview of our four year long research initiative in citation networks. Following a very brief introduction about temporal networks, I shall talk about our investigations in the context of the computer science domain further sub-divided into its constituent fields (e.g., AI, ALGO, NETW, ML, IR etc.). Out of a number of things, I shall mainly focus on three issues -- (a) impact of ancient papers (b) the evolutionary landscape of the CS fields; (c) long-time characteristics of the scientific citation profiles and accurate prediction of future citation counts and (d) formation of co-authorship circles (very briefly).
Vaibhav Rajan
Xerox Research Centre India

Title: Dependency Modeling with Copulas

Abstract: I'll begin with a brief introduction to copulas and their use in modeling dependencies in multivariate distributions. I'll then describe parameter estimation methods for Gaussian and Gaussian Mixture Copulas for continuous- and ordinal- valued data. I'll conclude with some of our recent results in dependency-seeking clustering using Gaussian mixture copulas.

Recommended Prior Reading:
  1. Probability Distributions, Clustering, Gaussian Mixture Models
  2. [Optional]: Expectation Maximization, MCMC
Ganesh Ramakrishnan
Indian Institute of Technology Bombay

Title: Optimal Subset Selection over DAGs with applications in Machine Learning

Abstract: In this talk we present some subset selection problems in Data mining and Machine Learning over Directed Acyclic Graphs (DAGs) and the corresponding optimization formulations and algorithms. We begin with the problem of supervised subset selection over lattices of features and present the framework of Hierarchical Kernel Learning (HKL) and illustrate its utility in the domain of feature/rule learning. HKL involves Multiple Kernel Learning over a set of given base kernels assumed to be embedded on a directed acyclic graph. We present a two-fold generalization of HKL: the first is employing a generic l1/lp block-norm regularizer (p in [1, 2]) that alleviates a key limitation of the HKL formulation. The second is a generalization to the case of multi-class, multi-label and more generally, multi-task applications. We present a highly specialized partial dual of the proposed generalized HKL formulation and an efficient mirror descent based active set algorithm for solving it. We present further generalizations of this approach in settings involving (a) feature spaces ranging from conjunctions of propositions to features in first order logic and (b) sequentially structured output spaces.

Next, we look at some subset selection problems over DAGs in data mining settings. More specifically, we study the problem of summarizing DAG-structured hierarchies of nodes over a given set of instances. Example applications include

(a) automatically generating Wikipedia disambiguation pages for a set of articles; in this setting, the nodes are Wikipedia categories and the instances are specific Wikipedia entities (b) generating candidate multi-labels for preparing machine learning datasets (e.g., for text classification, functional genomics, and image classification); in this setting, the nodes are labels and instances are documents (c) building compact lexicons of patterns for cross-domain machine translation; in this setting, the nodes are patterns and instances are sentences in the corpus.

Existing approaches to this problem either focus on clustering the set of instances using the node hierarchy as features or approximate the node subset selection problem as a problem of selecting good individual nodes. We directly pose the problem as that of submodular optimization on a hierarchy of nodes by designing feature functions on the instances and node subsets. Desirable properties/features of the chosen subset of nodes include instance coverage, specificity, node diversity, and node homogeneity, each of which, we show, is naturally modeled by a submodular function. Other information, provided say by unsupervised approaches such as LDA and its variants, can also be utilized by defining a submodular function that expresses coherence between the chosen nodes and this information. We use a large-margin framework to learn convex mixtures over the set of submodular components. We present empirically evaluations of our method on (a) the problem of automatically generating Wikipedia disambiguation pages using human generated clusterings as ground truth and (b) the problem of determining the right subset of patterns for curating (bilingual) dictionaries that would improve performance in tasks such as statistical machine translation.
Vikas Raykar
IBM Research India

Title: Evolving predictive models

Abstract: A conventional textbook prescription for building good predictive models is to split the data into three parts: training set (for model fitting), validation set (for model selection), and test set (for final model assessment). However in practice searching for the best predictive model is often an iterative and continuous process. Predictive models can potentially evolve over time as developers improve their performance either by acquiring new data or improving the existing model. In this talk we will discuss some of the challenges such scenarios and propose a few solutions to safeguard the bias due to the repeated use of the existing validation or the test set.

(Joint work with Amrita Saha, IBM Research)

References:
  1. Data Split Strategies for Evolving Predictive Models. Vikas C. Raykar and Amrita Saha, ECML-PKDD 2015
  2. The reusable holdout: Preserving validity in adaptive data analysis. Cynthia Dwork et al, Science 2015
Shirish Shevade
Indian Institute of Science

Title: Optimization Algorithms for Large-Scale Machine Learning

Abstract: Optimization techniques have played an important role in machine learning, especially after the inception of support vector machines (SVMs). The classical convex quadratic programming problem of SVM and its variants continue to attract both the Optimization and Machine Learning communities. In this talk, we review some efficient optimization algorithms designed to solve such quadratic programming problems. These algorithms are simple, easy to implement and do not require any sophisticated mathematical libraries. We will also discuss about the issues involved when the problems have large size and present some solution approaches for such problems.
Parag Singla
Indian Institute of Technology Delhi

Title: Exploiting Contextual Symmetries in Statistical (Relational) Models

Abstract: Several domains of interest including those in social network analysis, biology, vision, NLP and IE need to represent the underlying relational structure as well as model uncertainty. Statistical relational models such as Markov logic achieve this by combining the power of relational representations (e.g. first-order logic) with statistical models (e.g. Markov networks). To efficiently perform reasoning in these models, lifted inference exploits the underlying symmetry by grouping together objects (or states) which behave similar to each other, and hence, have the same associated probabilities. In this talk, starting with the background on Markov logic, we will look at the novel idea of lifting using contextual symmetries i.e. symmetries available only under specific assignments to a subset of variables. We will formally define what contextual symmetries are and then proceed to describe their use in lifting a general purpose MCMC sampler for graphical models. We will present our evaluation on two representative domains followed by directions for future work. The work presented here has been accepted for publication at IJCAI 2016.