VISA Machine Learning Seminar

Fall 2017
Location: SAL 101

*Click on the titles to view the abstracts.


Date Speaker Title
19 Sep 2017
Julian McAuley (UCSD)
Time: 3:30-4:50pm Abstract: Predictive models of human behavior–and in particular recommender systems–learn patterns from large volumes of historical activity data, in order to make personalized predictions that adapt to the needs, nuances, and preferences of individuals. Models may take incredibly complex data as *input*, ranging from text, images, social networks, or sequence data. However, the *outputs* they are trained to predict–clicks, purchases, transactions, etc.–are typically simple, numerical quantities, in order for the problem to be cast in terms of traditional supervised learning frameworks. In this talk, we discuss possible extensions to such personalized, predictive models of human behavior so that they are capable of predicting complex structured *outputs*. For example, rather than training a model to predict what content a user might interact with, we could predict how they would react to unseen content, in the form of text they might write. Or, rather than predicting whether a user would purchase an existing product, we could predict the characteristics or attributes of the types of products that *should* be created. BIO: Julian McAuley has been an Assistant Professor in the Computer Science Department at the University of California, San Diego since 2014. Previously he was a postdoctoral scholar at Stanford University after receiving his PhD from the Australian National University in 2011. His research is concerned with developing predictive models of human behavior using large volumes of online activity data.
19 Oct 2017
Phillip Isola (Berkeley)
Time: 3:30-4:50pm Abstract: Over the past decade, learning-based methods have driven rapid progress in computer vision. However, most such methods still require a human "teacher" in the loop. Humans provide labeled examples of target behavior, and also define the objective that the learner tries to satisfy. The way learning plays out in nature is rather different: ecological scenarios involve huge quantities of unlabeled data and only a few supervised lessons provided by a teacher (e.g., a parent). I will present two directions toward computer vision algorithms that learn more like ecological agents. The first involves learning from unlabeled data. I will show how objects and semantics can emerge as a natural consequence of predicting raw data, rather than labels. The second is an approach to data prediction where we not only learn to make predictions, but also learn the objective function that scores the predictions. In effect, the algorithm learns not just how to solve a problem, but also what exactly needs to be solved in order to generate realistic outputs. Finally, I will talk about my ongoing efforts toward sensorimotor systems that not only learn from provided data but also act to sample more data on their own. BIO: Phillip Isola is currently a Fellow at OpenAI, and he will be starting as an Assistant Professor in EECS at MIT in 2018. He received his Ph.D. in the Brain & Cognitive Sciences department at MIT, and spent two years as a postdoc in the EECS department at UC Berkeley. He studies visual intelligence from the perspective of both minds and machines.
09 Nov 2017
Jimmy Ba (UToronto)
Time: 3:30-4:50pm Abstract: Optimization lies at the core of any deep learning systems. In this talk, I will first discuss the recent advances in optimization algorithms to train deep learning models. Then I will present a novel family of 2nd-order optimization algorithms that leverage distributed computing to significantly shortening the training time of neural networks with tens of millions of parameters. The talk will conclude by showing how our algorithms can be successfully applied to domains such as reinforcement learning and generative adversarial networks. BIO: Jimmy is finishing his PhD with Geoff Hinton in the Machine Learning group at the University of Toronto. Jimmy will be a Computational Fellow at MIT before returning as full-time faculty to the CS department at UofT, as well as joining the Vector Institute. Jimmy completed his BAc, MSc at UofT working with Brendan Frey and Ruslan Salakhutdinov. He has previously spent time at Google Deepmind and Microsoft Research, and is a recipient of Facebook Graduate Fellowship for 2016 in machine learning. His primary research interests are in the areas of artificial intelligence, neural networks, and numerical optimization.
28 Nov 2017
Carl Vondrick (Columbia)
Time: 3:30-4:50pm Abstract: Machine learning is revolutionizing our world: computers can recognize images, translate language, and even play games competitively with humans. However, there is a missing piece that is necessary for computers to take actions in the real world. My research studies Predictive Vision with the goal of anticipating the future events that may happen. To tackle this challenge, I present predictive vision algorithms that learn directly from large amounts of raw, unlabeled data. Capitalizing on millions of natural videos, my work develops methods for machines to learn to anticipate the visual future, forecast human actions, and recognize ambient sounds. BIO: Carl Vondrick is a research scientist at Google and he will be an assistant professor at Columbia University in fall 2018. He received his PhD from the Massachusetts Institute of Technology in 2017. His research was awarded the Google PhD Fellowship, the NSF Graduate Fellowship, and is featured in popular press, such as NPR, CNN, the Associated Press, and the Late Show with Stephen Colbert.
30 Jan 2018
David Sontag (MIT)
Time: 4:00-5:20pm Abstract: A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats '18), structure in model parameters (Weller, Rowland, Sontag, AI Stats '16), and reinterpreting inference as ground truth recovery (Globerson, Roughgarden, Sontag, Yildirim, ICML '15). BIO: David Sontag joined MIT in January 2017 as Assistant Professor in the Department of Electrical Engineering and Computer Science (EECS) and Hermann L. F. von Helmholtz Career Development Professor in the Institute for Medical Engineering and Science (IMES). He is also a principal investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL). Sontag's research focuses on machine learning and artificial intelligence; at IMES, he leads a research group that aims to use machine learning to transform health care. Previously, he was an assistant professor in computer science and data science at New York University's Courant Institute of Mathematical Sciences and a postdoctoral researcher at Microsoft Research New England. Dr. Sontag received the Sprowls award for outstanding doctoral thesis in Computer Science at MIT in 2010, best paper awards at the conferences Empirical Methods in Natural Language Processing (EMNLP), Uncertainty in Artificial Intelligence (UAI), and Neural Information Processing Systems (NIPS), faculty awards from Google, Facebook, and Adobe, and a NSF CAREER Award. Dr. Sontag received a B.A. from the University of California, Berkeley.
08 Feb 2018
Luke Zettlemoyer (UW)
Previous Seminars Sponsored by 

Spring 2017

Date Speaker Title
02 Feb 2017
Finale Doshi-Velez (Harvard)
Time: 4:00-5:00pm Abstract: Nonnegative matrix factorization (NMF) is a popular dimension reduction technique that produces interpretable decomposition of the data into parts. However, this decomposition is often not identifiable, even beyond simple cases of permutation and scaling. Non-identifiability is an important concern in practical data exploration settings, in which the basis of the NMF factorization may be interpreted as having some kind of meaning: it may be important to know that other non-negative characterizations of the data were also possible. While other studies have provide criteria under which NMF is unique, in this talk I'll discuss when and how an NMF might *not* be unique. Then I'll discuss some novel algorithms for characterizing the posterior in Bayesian NMF. BIO: Finale Doshi-Velez is an Assistant Professor in Computer Science at Harvard University. Prior to that, she was a NSF CiTraCS postdoctoral fellow at Harvard Medical School and a Marshall Scholar at the University of Cambridge. She completed her PhD at MIT. Her interests lie in the intersection of healthcare and machine learning.
17 Mar 2017
Location:RTH 526

Anshumali Shrivastava (Rice)
Time: 10:30-11:30am Abstract: Large scale machine learning and data mining applications are constantly dealing with datasets at TB scale and the anticipation is that soon it will reach PB level. At this scale, simple data mining operations such as search, learning, and clustering become challenging. In this talk, we will start with a basic introduction to probabilistic hashing (or fingerprinting) and the classical LSH algorithm. Then I will present some of my recent adventures with probabilistic hashing in making large-scale machine learning practical. I will show how the idea of probabilistic hashing can be used to significantly reduce the computations in classical machine learning algorithms such Deep Learning (using our recent success with asymmetric hashing for inner products). I will highlight the computational bottleneck, i.e. the hashing time, and will show an efficient variant of minwise hashing. In the end, if time permits, I will demonstrate the use of probabilistic hashing for obtaining practical privacy-preserving algorithms. BIO: Anshumali Shrivastava is an assistant professor in the computer science department at Rice University. His broad research interests include large scale machine learning, randomized algorithms for big data systems and graph mining. He is a recipient of 2017 NSF CAREER Award. His research on hashing inner products has won Best Paper Award at NIPS 2014 while his work on representing graphs got the Best Paper Award at IEEE/ACM ASONAM 2014. He obtained his PhD in computer science from Cornell University in 2015.
27 Apr 2017
Matus Telgarsky (UIUC)
Time: 4:00-5:00pm Abstract: This talk will present a series of mathematical vignettes on the representation power of neural networks. Amongst old results, the classical universal approximation theorem will be presented, along with Kolmogorov's superposition theorem. Recent results will include depth hierarchies (for any choice of depth, there exists functions which can only be approximated by slightly less deep networks when they have exponential size), connections to polynomials (namely, rational functions and neural networks well-approximate each other), and the power of recurrent networks. Open problems will be sprinkled throughout. Bio: Matus Telgarsky is an assistant professor at UIUC. He received his PhD in 2013 at UCSD under Sanjoy Dasgupta. He works in machine learning theory; his current interests are non-convex optimization and neural network representation.

Fall 2016

Date Speaker Title
27 Sep 2016
Le Song (Gatech)
Time: 4:00-5:00pm Abstract: Structured data, such as sequences, trees, graphs and hypergraphs, are prevalent in a number of interdisciplinary areas such as network analysis, knowledge engineering, computational biology, drug design and materials science. The availability of large amount of such structured data has posed great challenges for the machine learning community. How to represent such data to capture their similarities or differences? How to learn predictive models from a large amount of such data, and efficiently? How to learn to generate structured data de novo given certain desired properties? A common approach to tackle these challenges is to first design a similarity measure, called the kernel function, between two data points, based on either statistics of the substructures or probabilistic generative models; and then a machine learning algorithm will optimize a predictive model based on such similarity measure. However, this elegant two-stage approach has difficulty scaling up, and discriminative information is also not exploited during the design of similarity measure. In this talk, I will present Structure2Vec, an effective and scalable approach for representing structured data based on the idea of embedding latent variable models into a feature space, and learning such feature space using discriminative information. Interestingly, Structure2Vec extracts features by performing a sequence of nested nonlinear operations in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving genome and protein sequences, drug molecules and energy materials, Structure2Vec consistently produces the-state-of-the-art predictive performance. Furthermore, in the materials property prediction problem involving 2.3 million data points, Structure2Vec is able to produces a more accurate model yet being 10,000 times smaller. In the end, I will also discuss potential improvements over current work, possible extensions to network analysis and computer vision, and thoughts on the structured data design problem. BIO: Le Song is an assistant professor in the Department of Computational Science and Engineering, College of Computing, Georgia Institute of Technology. He received his Ph.D. in Machine Learning from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the Department of Machine Learning, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology, he was a research scientist at Google. His principal research direction is machine learning, especially kernel methods and probabilistic graphical models for large scale and complex problems, arising from artificial intelligence, network analysis, computational biology and other interdisciplinary domains. He is the recipient of the AISTATS'16 Best Student Paper Award, IPDPS'15 Best Paper Award, NSF CAREER Award’14, NIPS’13 Outstanding Paper Award, and ICML’10 Best Paper Award. He has also served as the area chair or senior program committee for many leading machine learning and AI conferences such as ICML, NIPS, AISTATS and AAAI, and the action editor for JMLR.
06 Oct 2016
Rong Ge (Duke)
Time: 4:00-5:00pm Abstract: Recently, several non-convex problems such as tensor decomposition, phase retrieval and matrix completion are shown to have no spurious local minima, which allows them to be solved by very simple local search algorithms. However, more complicated non-convex problems such as the Tensor PCA do have local optima that are not global, and previous results rely on techniques inspired by Sum-of-Squares hierarchy. In this work we show the commonly applied homotopy method, which tries to solve the optimization problem by considering different levels of "smoothing", can be applied to tensor PCA and achieve similar guarantees as the best known Sum-of-Squares algorithms. This is one of the first settings where local search algorithms are guaranteed to avoid spurious local optima even in high dimensions. This is based on joint work with Yuan Deng (Duke University). BIO: Rong Ge is an assistant professor at Duke computer science department. He got his Ph.D. in Princeton University and was a post-doc at Microsoft Research New England before joining Duke. Rong Ge is broadly interested in theoretical computer science and machine learning. His research focuses on designing algorithms with provable guarantees for machine learning problems, with applications to topic models, sparse coding and computational biology.
27 Oct 2016
Sewoong Oh (UIUC)
Time: 4:00-5:00pm Abstract: Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we propose a new model for representing practical crowdsourcing systems, which strictly generalizes the popular Dawid-Skene model, and characterize the fundamental trade-off between budget and accuracy. We introduce a novel adaptive scheme that matches this fundamental limit. We introduce new techniques to analyze the spectral analyses of non-back-tracking operators, using density evolution techniques from coding theory. BIO: Sewoong Oh is an Assistant Professor of Industrial and Enterprise Systems Engineering at UIUC. He received his PhD from the department of Electrical Engineering at Stanford University. Following his PhD, he worked as a postdoctoral researcher at Laboratory for Information and Decision Systems (LIDS) at MIT. He was co-awarded the Kenneth C. Sevcik outstanding student paper award at the Sigmetrics 2010, the best paper award at the SIGMETRICS 2015, and NSF CAREER award in 2016.
08 Nov 2016
Location:RTH 526

Robert Nowak (UW–Madison)
Time: 11:00am-12:00pm Abstract: Modeling human perception has many applications in cognitive, social, and educational science, as well as in advertising and commerce. This talk discusses theory and methods for learning rankings and embeddings representing perceptions from datasets of human judgments, such as ratings or comparisons. I will briefly describe an ongoing large-scale experiment with the New Yorker magazine that deals with ranking cartoon captions using on our system. Then I will discuss our recent work on ordinal embedding, also known as non-metric multidimensional scaling, which is the problem of representing items (e.g., images) as points in a low-dimensional Euclidean space given constraints of the form "item i is closer to item j than item k.” In other words, the goal is to find a geometric representation of data that is faithful to comparative similarity judgments. This classic problem is often used to gauge and visualize perceptual similarities. A variety of algorithms exist for learning metric embeddings from comparison data, but the accuracy and performance of these methods were poorly understood. I will present a new theoretical framework that quantifies the accuracy of learned embeddings and indicates how many comparisons suffice as a function of the number of items and the dimension of the embedding. Furthermore, the theory points to new algorithms that outperform previously proposed methods. I will also describe a few applications of ordinal embedding. This joint work with Lalit Jain and Kevin Jamieson. BIO: Rob is the McFarland-Bascom Professor in Engineering at the University of Wisconsin-Madison, where his research focuses on signal processing, machine learning, optimization, and statistics. The BeerMapper and NEXT systems are recent applications of his research. Rob is a professor in Electrical and Computer Engineering, as well as being affiliated with the departments of Computer Sciences, Statistics, and Biomedical Engineering at the University of Wisconsin. He is also a Fellow of the IEEE and the Wisconsin Institute for Discovery, a member of the Wisconsin Optimization Research Consortium and Machine Learning @ Wisconsin, and organizer of the SILO seminar series. Rob is also an Adjoint Professor at the Toyota Technological Institute at Chicago.
14 Nov 2016
Location:SGM 123

Hal Daumé III (UMD)
Time: 12:00-1:00pm Abstract: Machine learning-based natural language processing systems are amazingly effective, when plentiful labeled training data exists for the task/domain of interest. Unfortunately, for broad coverage (both in task and domain) language understanding, we're unlikely to ever have sufficient labeled data, and systems must find some other way to learn. I'll describe a novel algorithm for learning from interactions, and several problems of interest, most notably machine simultaneous interpretation (translation while someone is still speaking). This is all joint work with some amazing (former) students He He, Alvin Grissom II, John Morgan, Mohit Iyyer, Sudha Rao and Leonardo Claudino, as well as colleagues Jordan Boyd-Graber, Kai-Wei Chang, John Langford, Akshay Krishnamurthy, Alekh Agarwal, Stéphane Ross, Alina Beygelzimer and Paul Mineiro. BIO: Hal Daumé III is an associate professor in Computer Science at the University of Maryland, College Park. He holds joint appointments in UMIACS and Linguistics. He was previously an assistant professor in the School of Computing at the University of Utah. His primary research interest is in developing new learning algorithms for prototypical problems that arise in the context of language processing and artificial intelligence. This includes topics like structured prediction, domain adaptation and unsupervised learning; as well as multilingual modeling and affect analysis. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern California with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University.
17 Nov 2016
Arindam Banerjee (UMN)
Time: 4:00-5:00pm Abstract: Many machine learning problems, especially scientific problems in areas such as ecology, climate science, and brain sciences, operate in the so-called `low samples, high dimensions' regime. Such problems typically have numerous possible predictors or features, but the number of training examples is small, often much smaller than the number of features. In this talk, we will discuss recent advances in general formulations and estimators for such problems. These formulations generalize prior work such as the Lasso and the Dantzig selector. We will discuss the geometry underlying such formulations, and how the geometry helps in establishing finite sample properties of the estimators. We will also discuss applications of such results in structure learning in probabilistic graphical models, along with real world applications in ecology and climate science. This is joint work with Soumyadeep Chatterjee, Sheng Chen, Farideh Fazayeli, Andre Goncalves, Jens Kattge, Igor Melnyk, Peter Reich, Franziska Schrodt, Hanhuai Shan, and Vidyashankar Sivakumar. BIO: Arindam Banerjee is an Associate Professor at the Department of Computer & Engineering and a Resident Fellow at the Institute on the Environment at the University of Minnesota, Twin Cities. His research interests are in statistical machine learning and data mining, and applications in complex real-world problems including climate science, ecology, recommendation systems, text analysis, brain sciences, finance, and aviation safety. He has won several awards, including the Adobe Research Award (2016), the IBM Faculty Award (2013), the NSF CAREER award (2010), and six Best Paper awards in top-tier conferences.
29 Nov 2016
Richard Samworth (U. Cambridge)
Time: 4:00-5:00pm Abstract: Abstract: Changepoints are a very common feature of Big Data that arrive in the form of a data stream. We study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms. BIO: Richard Samworth is Professor of Statistics in the Statistical Laboratory at the University of Cambridge, and currently holds a GBP 1.2M Engineering and Physical Sciences Research Council Early Career Fellowship. He received his PhD in Statistics, also from the University of Cambridge, in 2004. Richard's main research interests are in nonparametric and high-dimensional statistical inference. Particular research topics include shape-constrained density and other nonparametric function estimation problems, nonparametric classification, clustering and regression, Independent Component Analysis, bagging and high-dimensional variable selection problems. Richard was awarded the Royal Statistical Society (RSS) Research prize (2008), the RSS Guy Medal in Bronze (2012) and a Philip Leverhulme prize (2014). He has been elected a Fellow of the Institute for Mathematical Statistics (2014) and the American Statistical Association (2015).

Spring 2016

Date Speaker Title
19 Jan 2016
Jure Leskovec (Stanford)
Time: 4:00-5:00 pm Abstract: In many real-life settings human judges are making decisions and choosing among many alternatives in order to label or classify items: Medical doctor diagnosing a patient, criminal court judge making a decision, a crowd-worker labeling an image, and a student answering a multiple-choice question. Gaining insights into human decision making is important for determining the quality of individual decisions as well as identifying mistakes and biases. In this talk we discuss the question of developing machine learning methodology for estimating the quality of individual judges and obtaining diagnostic insights into how various judges decide on different kinds of items. We develop a series of increasingly powerful hierarchical Bayesian models which infer latent groups of judges and items with the goal of obtaining insights into the underlying decision process. We apply our framework to a wide range of real-world domains, and demonstrate that our approach can accurately predict judges decisions, diagnose types of mistakes judges tend to make, and infer true labels of items. BIO: Jure Leskovec is assistant professor of Computer Science at Stanford University and chief scientist at Pinterest. His research focuses on mining large social and information networks, their evolution, and the diffusion of information and influence over them. Computation over massive data is at the heart of his research and has applications in computer science, social sciences, economics, marketing, and healthcare. This research has won several awards including a Lagrange Prize, Microsoft Research Faculty Fellowship, Alfred P. Sloan Fellowship, and numerous best paper awards. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. You can follow him on Twitter @jure
26 Jan 2016
Christopher Ré (Stanford)
Time: 4:00-5:00 pm Abstract: Many pressing questions in science are macroscopic, as they require scientists to integrate information from numerous data sources, often expressed in natural languages or in graphics; these forms of media are fraught with imprecision and ambiguity and so are difficult for machines to understand. Here I describe DeepDive, which is a new type of system designed to cope with these problems. It combines extraction, integration and prediction into one system. For some paleobiology and materials science tasks, DeepDive-based systems have surpassed human volunteers in data quantity and quality (recall and precision). DeepDive is also used by scientists in areas including genomics and drug repurposing, by a number of companies involved in various forms of search, and by law enforcement in the fight against human trafficking. DeepDive does not allow users to write algorithms; instead, it asks them to write only features. A key technical challenge is scaling up the resulting inference and learning engine, and I will describe our line of work in computing without using traditional synchronization methods including Hogwild! and DimmWitted. DeepDive is open source on github and available from DeepDive.Stanford.Edu. Bio: Christopher (Chris) Re is an assistant professor in the Department of Computer Science at Stanford University and a Robert N. Noyce Family Faculty Scholar. His work's goal is to enable users and developers to build applications that more deeply understand and exploit data. Chris received his PhD from the University of Washington in Seattle under the supervision of Dan Suciu. For his PhD work in probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. He then spent four wonderful years on the faculty of the University of Wisconsin, Madison, before moving to Stanford in 2013. He helped discover the first join algorithm with worst-case optimal running time, which won the best paper at PODS 2012. He also helped develop a framework for feature engineering that won the best paper at SIGMOD 2014. In addition, work from his group has been incorporated into scientific efforts including the IceCube neutrino detector and PaleoDeepDive, and into Cloudera's Impala and products from Oracle, Pivotal, and Microsoft's Adam. He received an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, and the MacArthur Foundation Fellowship in 2015.
02 May 2016
SAL 213

James Foulds (UCSD)
Time: 1:30-2:30 pm Abstract: Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a nontrivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this talk I will introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research. Custom models can straightforwardly be developed in this framework with an intuitive first-order logical probabilistic programming language. Latent topic networks admit scalable training via a parallelizable EM algorithm which leverages ADMM in the M-step. I demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses. This talk is based on joint work with Lise Getoor and Shachi Kumar from the University of California, Santa Cruz, published at ICML 2015.
06 May 2016
Location:RTH 526

John Lafferty (University of Chicago)
Time: 11:00 am -12:00 pm Abstract: Imagine that I estimate a statistical model from data, and then want to share my model with you. But we are communicating over a resource constrained channel. By sending lots of bits, I can communicate my model accurately, with little loss in statistical risk. Sending a small number of bits will incur some excess risk. What can we say about the tradeoff between statistical risk and the communication constraints? This is a type of rate distortion and constrained minimax problem, for which we provide a sharp analysis in certain nonparametric settings. We also consider the problem of estimating a high dimensional convex function, and develop a screening procedure to identify irrelevant variables. The approach adopts on a two-stage quadratic programming algorithm that estimates a sum of one-dimensional convex functions, beating the curse of dimensionality that holds under smoothness constraints. Joint work with Yuancheng Zhu and Min Xu.

Fall 2015

View highlights of fall 2015

Date Speaker Title
15 Oct 2015
Xifeng Yan (UCSB)
[Highlight Video] 
Time: 4:00 PM -5:00 PM Abstract: In this talk, I will first give an overview about graph data mining and data management studies conducted in my lab and then introduce two projects related to analyzing and searching collaborative and information networks. Collaborative networks are composed of experts who cooperate with each other to complete specific tasks, such as resolving problems reported by customers. We attempt to deduce the cognitive process of task routing and model the decision making of experts. We formalize multiple routing patterns by taking into account both rational and random analysis of tasks, and present a generative model to combine them. In the second part of my talk, I will show the challenge of querying complex graphs such as knowledge graphs and introduce a novel framework enabling schemaless graph querying (SLQ), where a user need not describe queries precisely as required by SQL. I will also brief our new progress in benchmarking graph queries. Xifeng Yan is an associate professor at the University of California, Santa Barbara. He holds the Venkatesh Narayanamurti Chair of Computer Science. He has been working on modeling, managing, and mining graphs in information networks, computer systems, social media and bioinformatics. He received NSF CAREER Award, IBM Invention Achievement Award, ACM-SIGMOD Dissertation Runner-Up Award, and IEEE ICDM 10-year Highest Impact Paper Award. He received his Ph.D. from the University of Illinois at Urbana-Champaign in 2006 and was a research staff member at the IBM T. J. Watson Research Center between 2006 and 2008.
22 Oct 2015
Lihong Li (MSR)
[Highlight Video] 
Time: 4:00 PM -5:00 PM Abstract: We consider contextual bandit problems, where in each round the learner takes one of K actions in response to the observed context, and observes the reward only for that chosen action. In the first part of the talk, we focus on the standard setting, where the challenge is to efficiently balance exploration/exploitation to maximize total rewards (equivalently, minimize total regret) in T rounds, a problem commonly encountered in many important interaction problems like advertising and recommendation. Our algorithm assumes access to an oracle for solving a form of classification problems and achieves the statistically optimal regret guarantee with a small number of oracle calls across T rounds. The resulting algorithm is the most practical one amongst contextual-bandit algorithms that work for general policy classes. In the second part of the talk, we show how the above general algorithmic idea can be adapted to contextual bandits with global convex constraints and concave objective functions, a setting that is substantially harder and is important in many applications. Joint work with Alekh Agarwal, Shipra Agrawal, Nikhil R. Devanur, Daniel Hsu, Satyen Kale, John Langford, and Robert E. Schapire. Lihong Li is a Researcher in the Machine Learning Department at Microsoft Research-Redmond. Prior to joining Microsoft, he was a Research Scientist in the Machine Learning Group at Yahoo! Research in Silicon Valley. He obtained a PhD degree from Rutgers University in Computer Science. His main research interests are machine learning with interaction, including reinforcement learning, multi-armed bandits, online learning, and their applications especially those on the Internet like recommender systems, search, and advertising. He has served as area chair or senior program committee member at ICML, NIPS, and IJCAI.
27 Oct 2015
Yisong Yue (Caltech)
Time: 4:00 PM -5:00 PM Abstract: In many animation projects, the animation artist typically spends significant time animating the face. This process involves many labor-intensive tasks that offer relatively little potential for creative expression. One particularly tedious task is speech animation: animating the face to match spoken audio. Indeed, the often prohibitive cost of speech animation has limited the types of animations that are feasible, including localization to different languages. In this talk, I will show how to view speech animation through the lens of data-driven sequence prediction. In contrast to previous sequence prediction settings, visual speech animation is an instance of contextual spatiotemporal sequence prediction, where the output is continuous and high-dimensional (e.g., a configuration of the lower face), and also depends on an input context (e.g., audio or phonetic input). I will present a decision tree framework for learning to generate context-dependent spatiotemporal sequences given training data. This approach enjoys several attractive properties, including ease of training, fast performance at test time, and the ability to robustly tolerate corrupted training data using a novel latent variable approach. I will showcase this approach in a case study on speech animation, where our approach outperforms several competitive baselines in both quantitative and qualitative evaluations, and also demonstrates strong robustness to corrupted training data. This is joint work with Taehwan Kim, Sarah Taylor, Barry-John Theobald and Iain Matthews.
29 Oct 2015
Nina Balcan (CMU)
Time: 4:00 PM -5:00 PM Abstract: Submodular functions are discrete functions that model laws of diminishing returns and enjoy numerous applications in many areas, including algorithmic game theory, machine learning, and social networks. For example, submodular functions are commonly used to model valuation functions for bidders in auctions, and the influence of various subsets of agents in social networks. Traditionally it is assumed that these functions are known to the decision maker; however, for large scale systems, it is often the case they must be learned from observations. In this talk, I will discuss a recent line of work on studying the learnability of submodular functions. I will describe general upper and lower bounds on the learnability of such functions that yield novel structural results about them of interest to many areas. I will also discuss even better guarantees that can be achieved for important classes that exhibit additional structure. These classes include probabilistic coverage functions that can be used to model the influence function in classic models of information diffusion in networks and functions with bounded complexity used in modeling bidder valuation functions in auctions. I will also discuss an application of our algorithms for learning the influence functions in social networks, that outperforms existing approaches empirically in both synthetic and real world data. Bio: Maria-Florina Balcan is an Associate Professor in the School of Computer Science at Carnegie Mellon University. Her main research interests are machine learning, computational aspects in economics and game theory, and algorithms. Her honors include the CMU SCS Distinguished Dissertation Award, an NSF CAREER Award, a Microsoft Faculty Research Fellowship, a Sloan Research Fellowship, and several paper awards. She was a Program Committee Co-chair for COLT 2014, and is currently a board member of the International Machine Learning Society and a Program Committee Co-chair for ICML 2016.
03 Nov 2015
Geoffrey Zweig (MSR)
Time: 4:00 PM -5:00 PM Abstract: The problem of generating text conditioned on some sort of side information arises in many areas including dialog systems, machine translation, speech recognition, and image captioning. In this talk, we present a highly effective method for generating text conditioned on a set of words that should be mentioned. We apply this to the problem of image captioning by linking the generation module to a convolutional neural network that predicts a set of words that are descriptive of an image. The system placed first in the 2015 MSCoco competition on the Turing Test measure, and tied for first place overall. Bio: Geoffrey Zweig is a Principal Researcher, and Manager of the Speech and Dialog Group at Microsoft Research. His work centers on developing improved algorithms for speech and language processing. Recent work has focused on applications of side-conditioned recurrent neural network language models, such as image captioning and grapheme to phoneme conversion. Prior to Microsoft, Dr. Zweig managed the Advanced Large Vocabulary Continuous Speech Recognition Group at IBM Research, with a focus on the DARPA EARS and GALE programs. In the course of his career, Dr. Zweig has written several speech recognition trainers and decoders, as well as toolkits for doing speech recognition with segmental conditional random fields, and for maximum entropy language modeling. Dr. Zweig received his PhD from the University of California at Berkeley. He is the author of over 80 papers, numerous patents, is an Associate Editor of Computers Speech and Language, and is a Fellow of the IEEE.
19 Nov 2015
Heng-Tze Cheng (Google Research)
Time: 4:00 PM -5:00 PM Abstract: Sibyl is one of the most widely used machine learning and prediction systems at Google, actively used in production in nearly every product area. Designed for the largest datasets at Google, Sibyl scales up to hundreds of billions of training examples and billions of features. Sibyl is used for various prediction tasks ranging from classification, regression, ranking to recommendations. Beyond core learning algorithms and scalable distributed systems, Sibyl contains a suite of data processing, monitoring, analysis, and serving tools, making it a robust and easy-to-use production system. Heng-Tze Cheng is currently a senior software engineer on the Sibyl large-scale machine learning team at Google Research. He has developed new search, ranking, and recommendation systems that are widely used across Google products. Heng-Tze received his Ph.D. from Carnegie Mellon University in 2013 and B.S. from National Taiwan University in 2008. His research interests include machine learning, user behavior modeling, and human activity recognition, with over 20 publications and 3 U.S. patents in the related fields.
03 Dec 2015
Kyunghyun Cho (NYU)
Time: 4:00 PM -5:00 PM Abstract: Neural machine translation is a recently proposed framework for machine translation, which is purely based on neural networks. Neural machine translation radically departs from the existing, widely-used, often phrase-based statistical machine translation by viewing the task of machine translation as a supervised, structured output prediction problem and solving it with recurrent neural networks. In this talk, I will describe in detail what neural machine translation is and discuss recent advances which have made it possible for neural machine translation system to be competitive with the conventional statistical approach. I will conclude the talk by presenting my view on the future of machine translation and a big question of "is natural language special?" Bio: Kyunghyun Cho is an assistant professor of Computer Science and Data Science at New York University (NYU). Previously, he was a postdoctoral researcher at the University of Montreal under the supervision of Prof. Yoshua Bengio after obtaining a doctorate degree at Aalto University (Finland) in early 2014. Kyunghyun's main research interests include neural networks, generative models and their applications, especially, to language understanding.