Word2Sense

This is the code base for our paper "Word2Sense: Sparse Interpretable Word Embeddings", accepted in 57th meeting of Association of Computational Linguistics (ACL 2019).

We have made minor changes in WarpLDA codebase, specifically in files Bigraph.cpp Bigraph.hpp and warplda.cpp files in src folder, to make warpLDA accept files in tsvd format.

Training on a dataset

Create a file containing a sentence in each line. Example.sh contains the entire pipeline for getting Word2Sense embeddings from the corpus, get performance scores on various similarity tasks and obtain WordCtxt2Sense embeddings for the WSI and SCWS tasks respectively.

Pretrained Vectors

In the link (https://drive.google.com/file/d/1kqxQm129RVfanlnEsJnyYjygsFhA3wH3/view?usp=sharing), you can find a zip file that contains a text file. The text file contains pretrained Word2Sense Vectors, where each line contains a word and its 2250 dimensional sparse representation.

Reproducibility of code

There have been few questions raised about the reproducibility of code on the wackypedia dataset. It's not possible to upload all the output files, since they are huge files. We provide a link (https://drive.google.com/file/d/1X3XFK9zSsWs0nSomukc_Plnw55pfOZjQ/view?usp=sharing) to a tar compressed version of the dataset that we used and a new sh file (wacky.sh) to reproduce the numbers on wackypedia.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
preprocessing		preprocessing
warplda @ 3f7ac25		warplda @ 3f7ac25
.gitmodules		.gitmodules
Calculate_similarityscores.py		Calculate_similarityscores.py
Calculate_topicJS.py		Calculate_topicJS.py
Example.sh		Example.sh
Infer.py		Infer.py
Infer_Word_in_Context.py		Infer_Word_in_Context.py
Performance_SCWS.py		Performance_SCWS.py
Performance_WSI.py		Performance_WSI.py
Preprocess_SCWS.py		Preprocess_SCWS.py
Preprocess_WSI.py		Preprocess_WSI.py
README.md		README.md
Topic_embeddings.py		Topic_embeddings.py
Word2Sense.py		Word2Sense.py
Write_tsvd.py		Write_tsvd.py
wacky.sh		wacky.sh

abhishekpanigrahi1996/Word2Sense

Folders and files

Latest commit

History

Repository files navigation

Word2Sense

Training on a dataset

Pretrained Vectors

Reproducibility of code

About

Resources

Stars

Watchers

Forks

Languages