Word and Graph Embeddings for Machine Learning

November 5, 2021
10:30am ET
Zoom only
Speaker: Steve Skiena, Stony Brook University
Host: ECE Graduate Seminar Series

Abstract

Distributed word embeddings (e.g. word2vec) provide a powerful way to reduce large text corpora to concise features (vectors) readily applicable to a variety of problems in NLP and data science. I will introduce word embeddings, and apply them in variety of new and interesting directions, including:

(1) Multilingual NLP -- The Polyglot project (www.polyglot-NLP.com) employs deep learning and other techniques to build a basic NLP pipeline (including entity recognition, POS tagging, and sentiment analysis) for over 100 different languages. We train our systems over each language's Wikipedia edition, providing unified data resources in the absence of explicitly annotated data, but substantial challenges in interpretation and evaluation.

(2) Detecting Historical Shifts in Word Meaning -- Words like "gay" and "mouse" have substantially shifted their meanings over time in response to societal and technological changes. We use word embeddings trained over texts drawn from different time periods to detect changes in word meanings. This is part of our efforts in historical trends analysis.

(3) Feature Extraction from Graphs -- We present DeepWalk, our approach for learning latent representations of vertices in a network, which has become extremely popular. DeepWalk uses local information on truncated random walks to learn embeddings, by treating walks as the equivalent of sentences in a language. It is suitable for a broad class of applications such as network classification and anomaly detection. We also introduce new graph embedding techniques based on random projections, which produce DeepWalk-quality embeddings thousands of times faster than previous algorithms.

Bio:

Skiena received his B.S. in Computer Science from the University of Virginia and his Ph.D. in Computer Science from the University of Illinois in 1988. He is a Fellow of the American Association for the Advancement of Science (AAAS), a former Fulbright scholar, and recipient of the University of Virginia Engineering Distinguished Alumni Award (WahooWa!), the ONR Young Investigator Award and the IEEE Computer Science and Engineer Teaching Award. More info is available at http://www.cs.stonybrook.edu/~skiena/.

Join meeting via Zoom: https://tufts.zoom.us/s/98846620413

Password: 353214

Meeting ID:988 4662 0413

Dial by location: 1 646 558 8656 US (New York)

Meeting ID:988 4662 0413

Passcode: 353214

Disregard password and passcode at end of the email; they do not apply to this meeting.