Adaptive Deep Learning for Vision and Language

February 24, 2016
11am - noon
Halligan 102
Speaker: Kate Saenko, University of Massachusetts, Lowell
Host: Roni Khardon


Advances in Machine Learning, and, in particular, Deep Learning, have recently propelled the state of the art in both Computer Vision (CV) and in Natural Language Processing (NLP), spurring strong academic and industry investment in these emerging areas of AI. A key aspect of this success is representation learning, i.e. the ability to learn useful feature representations from large amounts of labeled data.

In this talk, I will address two serious limitations of representation learning for CV and NLP. First, these two fields have evolved separately, focusing on either image or language representations alone. Our work on joint learning for vision and language creates representations that directly connect visual concepts to natural language semantics. Our research was among of the first to propose deep neural nets for automatic captioning of images and videos, and spatial memory nets for answering questions about visual scenes.

A second key problem in applying supervised ML, including deep methods, to real world environments is the dataset bias issue: Deviations from the training distribution at test time can lead to catastrophic failure. I will give an overview of our efforts to endow learning models with the ability to transfer knowledge between domains and adapt to real world environments, concluding with recently proposed methods for effective and simple adaptation of deep neural networks without requiring millions of training examples.

Bio: Kate Saenko is an Assistant Professor of Computer Science at the University of Massachusetts Lowell, where she leads the Computer Vision and Learning Group. She received her PhD from MIT, and did postdoctoral work at UC Berkeley and Harvard. Her research spans the areas of computer vision, machine learning, and human-robot interfaces. Dr Saenko's current research interests include domain adaptation of machine learning models and joint modeling of language and vision. She is the recipient of research grant awards from the National Science Foundation, DARPA, and other government and industry agencies.