Span Based AutoEncoder for sentences

This repository implements Span Based AutoEncoder for text data.

The model is structured as follows:
For each sentence we:

tokenize and embed source words
compute contextualized word embeddings by feeding embedded sentence to bidirectional LSTM
extract all possible spans of width up to N from the encoder outputs (embeddings from step 2)
prune some spans based on FeedForward network scorer that trains end-to-end with the rest of the model
(top K spans are left after this step)
try to reconstruct the sentence with decoder LSTM which attends to the remaining spans from the step 4

The code is written using allennlp library. Follow the installation procedure from the allennlp website (available via pip)
To run this code simply execute:
python -m allennlp.run train configs/experiment_gpu.json --serialization-dir models/baseline --include-package span_ae

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
notebooks		notebooks
span_ae		span_ae
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

notebooks

notebooks

span_ae

span_ae

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Span Based AutoEncoder for sentences

About

Releases

Packages

Languages

License

MaksymDel/span_ae

Folders and files

Latest commit

History

Repository files navigation

Span Based AutoEncoder for sentences

About

Resources

License

Stars

Watchers

Forks

Languages