Unsupervisedly Learning Sentence Representations

The project aims to explore better ways to learn phrase and sentence representations. In Machine Vision, people have settled on an underlying neural network structure to “read” images and frequently use pre-training this “reading” layer, the same is not true for NLP, where we only really use word representations.

Particular questions this research is going to address include:

Can we apply word representation methods to phrases and sentences?
Can we do so in a computationally efficient manner?
Will linguistic patterns arise? How do we observe them?

Skipthought vectors

As a starting point, this repository contains a Tensorflow 1.0 implementation of the Skipthought paper (Kiros et al.)

The structure of the model is as follows:

An encoder is used to find a vector representation of a sentence;
Then decoders are used to predict the preceding and the following sentence
Training this model also yields word representations

$Figure taken from Kiros et al. \label{kiros}$

The Skipthought vector model finds good sentence representations, such that similar sentences are close to each other in vector space. However by relying on two decoders, the method is computationally expensive. If the goal is just to find sentence embeddings, then other methods such as sequential autoencoders might be more efficient alternatives. My project is going to explore these alternatives.

Work in progress

The current implementation of skipthought is able to overfit a small corpus like 'gingerbread.txt' and to produce gramatically correct sentences on a larger corpus like 'sherlock.txt'. Next steps are to train the model on larger corpora, and to implement potentially more efficient models like a sequential autoencoder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
corpus		corpus
README.md		README.md
skipthought.py		skipthought.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpus

corpus

README.md

README.md

skipthought.py

skipthought.py

util.py

util.py

Repository files navigation

Unsupervisedly Learning Sentence Representations

Skipthought vectors

Work in progress

About

Releases

Packages

Languages

nevakanezzar/skipthought

Folders and files

Latest commit

History

Repository files navigation

Unsupervisedly Learning Sentence Representations

Skipthought vectors

Work in progress

About

Resources

Stars

Watchers

Forks

Languages