Deep Sequence Models: Context Representation, Regularization, and Application to Language.
Recurrent Neural Networks (RNNs) are the most successful models for
sequential data. They have achieved state-of-the-art results in many
tasks including language modeling, image and text generation, speech
recognition, and machine translation. Despite all these successes,
RNNs still face some challenges: they fail to capture long-term
dependencies (don't believe the myth that they do!) and they easily
The ability to capture long-term dependencies in sequential data
depends on the way context is represented. Theoretically, RNNs capture
all the dependencies in the sequence via the use of recurrence and
parameter sharing. However practically, RNNs face optimization issues.
Assumptions made to counter these optimization challenges hinder the
capability of RNNs to capture long-term dependencies. On the other
hand, the overfitting problem of RNNs stem from the strong dependence
of the hidden units to each other.
I will talk about my research on context representation and
regularization for RNNs. First, I will make the case that in the
context of language, topic models are very effective at representing
context and can be used jointly with RNNs to facilitate learning and
capture long-term dependencies. Second, I will discuss our new
proposed method to regularize RNNs called NOISIN. NOISIN relies on the
concept of unbiased noise injection in the hidden units of RNNs to
reduce co-adaptation. It significantly improves the generalization
capabilities of existing RNN-based models. For example, it improves
RNNs with dropout by as much as 12.2% on the Penn Treebank and 9.4% on
the Wikitext-2 dataset.
Adji Bousso Dieng is a PhD student at Columbia University where she works with David Blei and John Paisley. Her work at Columbia is about combining probabilistic graphical modeling and deep learning to design better sequence models. She develops these models within the framework of variational inference which enables efficient and scalable learning. Her hope is that her research can be applied to many real world applications particularly to natural language understanding. Prior to joining Columbia, she worked as a Junior Professional Associate at the World Bank. She did her undergraduate training in France where she attended Lycee Henri IV and Telecom ParisTech---France's Grandes Ecoles system. She holds a Diplome d'Ingenieur from Telecom ParisTech and spent the third year of Telecom ParisTech's curriculum at Cornell University where she earned a Master in Statistics. Find more in her homepage http://stat.columbia.edu/~diengadji/