Capturing and evaluating higher order relations in word embeddings using tensor factorization

April 28, 2017
10:00am
Halligan 102
Speaker: Eric Bailey
Host: Shuchin Aeron

Abstract

In Natural Language Processing, most popular word embeddings involve low-rank factorization of a word co-occurrence based matrix. We aim to generalize this trend by studying word embeddings given by low-rank factorization of word co-occurrence based higher-order arrays, or tensors. We present four novel word embeddings based on tensor factorization and show they outperform popular state-of-the-art baselines on a number of recent benchmarks, encoding useful properties in a new way. To create one of our word embeddings, we present a novel joint symmetric tensor factorization problem related to the idea of coupled tensor factorization. We also modify a recent embedding evaluation technique known as Outlier Detection to measure the degree to which an embedding captures Nth order information, showing that tensor embeddings (naturally) outperform popular pairwise embeddings at this task. Suggested applications of tensor factorization-based word embeddings are given, and all source code and pre-trained vectors are publicly available online.