Eigenthemes

Source code for "Low-rank Subspaces for Unsupervised Entity Linking"

Detailed instructions to run the code

Clone this repository using git clone https://github.com/blind-anonymous/eigenthemes.git
Download Anaconda (64-bit Python 3.7 version)
- The Anaconda installer would provide the following prompt: 'Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no]'. Answering 'yes' would make your life simpler, as your 'bashrc'/'bash_profile' would be automatically updated with all the environment variables properly set.
- If you choose to answer 'yes' in the previous step, please run source <path-to-your bashrc or bash_profile> to set all the environment variables properly in your currently active terminal.
Setup the virtual environment named el to install all the required dependencies conda env create -f el.yml
Activate the installed environment conda activate el
Download the resources (data and embeddings) available via google drive (no sign-in required)
1. Unzip the data.zip file in the empty data directory provided with the code repository
2. Unzip the deepwalk_wikidata.pickle.zip file in the empty embeddings directory provided with the code repository
Download the resources for Le and Titov (pretrained models) available via google drive (no sign-in required)
1. Unzip the models.zip file in the empty models directory provided with the code repository
  Important Note: If you want to train the model from scratch, you have to remove the current saved model (if existent) using rm -rf models/*. Retrain the models using bash train_taumilnd.sh, which will train five different models on the train set
Reproducing results presented in Table-2
- NameMatch Baseline: Run python namematch.py. This script will produce the results for the name-matching baseline as described in the paper for each of the four datasets considered in this study.
- $\tau$ MIL-ND by Le and Titov: Run bash evaluate_taumilnd.sh. This script will produce the results for the state of the art $\tau$ MIL-ND for each of the four datasets considered in this study. It also outputs the mean and standard deviation of precision@1 and MRR over five independent runs of $\tau$ MIL-ND on the terminal.
- Eigen (Proposed Technique): Run python unsupervised_el.py. This script will produce the results for Eigen for all the four considered datasets. The description of Eigenthemes (Eigen) can be found in the paper.
- The overall micro Precision@1 and MRR is present in the 12th and 13th column of the results files. Additional information can be self-inferred, thanks to the descriptive header present in each output file.
  Important Note: The results are stored in the empty directory results provided with the code repository. Precomputed results for the aforementioned techniques for all the datasets have already been updated in results directory of the code repository. Also, the results filenames are self-explanatory.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
embeddings		embeddings
jrk		jrk
models		models
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
el.yml		el.yml
evaluate_taumilnd.sh		evaluate_taumilnd.sh
namematch.py		namematch.py
pca.py		pca.py
train_taumilnd.sh		train_taumilnd.sh
unsupervised_el.py		unsupervised_el.py
utils.py		utils.py
utils_wpca.py		utils_wpca.py
wpca.py		wpca.py

License

helioxgroup/eigenthemes

Folders and files

Latest commit

History

Repository files navigation

Eigenthemes

Detailed instructions to run the code

About

Resources

License

Stars

Watchers

Forks

Languages