HRERE

Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction

Paper Published in NAACL 2019: HRERE

Prerequisites

tensorflow >= r1.2
hyperopt
gensim
sklearn

Dataset

To download the dataset used:

cd ./data
python prepare_data.py

Preprocessing

Construct the knowledge graph:

python create_kg.py

Preprocessing the data:

python preprocess.py -p -g

Complex Embeddings

Copy the directory ./fb3m in the data folder in tensorflow-efe and run the following commands to obtain the complex embeddings:

python preprocess.py --data fb3m
python train.py --model best_Complex_tanh_fb3m --data fb3m --save
python get_embeddings.py --embed complex --model best_Complex_tanh_fb3m --output <repo_path>/fb3m

Then copy e2id.txt and r2id.txt in the tensorflow-efe/data/fb3m to ./fb3m and run the following command:

python get_embeddings.py

Hyperparameters Tuning

python task.py --model <model_name> --eval <max_number_of_search> --runs <number_of_runs_per_setting>

model_name can be found in model_param_space.py. You can also define the search space by yourself.

Evaluation

python eval.py --model <model_name> --prefix <file_prefix> --runs <number_of_runs>

model_name can be found in model_param_space.py. To replicate our results, use best_complex_hrere as the model_name. It will run the model multiple times and calculate the means and stds of P@N which are logged in ./log. The predicted probabilities and labels of the first run are stored in plot/output for plotting PR curves.

Results

After replicating the results, we find that the results on P@N(%) reported in the paper seem to be a bit over-optimisitic due to the variance. According our replication based on 5 runs (./log/replication.log), the results are P@10% (0.849 +- 0.019), P@30% (0.728 +- 0.019), P@50% (0.636 +- 0.013). We also report our scores to NLP Progress based on this replication.

Cite

If you found this codebase or our work useful, please cite:

@InProceddings{xu2019connecting,
  author = {Xu, Peng and Barbosa, Denilson},
  title = {Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction}
  booktitle = {The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019)},
  month = {June},
  year = {2019},
  publisher = {ACL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
log		log
plot		plot
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bilstm.py		bilstm.py
complex_hrere.py		complex_hrere.py
config.py		config.py
create_kg.py		create_kg.py
eval.py		eval.py
final_plot.py		final_plot.py
get_embeddings.py		get_embeddings.py
model.py		model.py
model_param_space.py		model_param_space.py
preprocess.py		preprocess.py
real_hrere.py		real_hrere.py
task.py		task.py

License

yuandongdongdong/HRERE

Folders and files

Latest commit

History

Repository files navigation

HRERE

Prerequisites

Dataset

Preprocessing

Complex Embeddings

Hyperparameters Tuning

Evaluation

Results

Cite

About

Resources

License

Stars

Watchers

Forks

Languages