GitHub - Anup-Deshmukh/TREC_background_linking: IR-BERT at TREC 2020: Leveraging BERT for Semantic Search in Background Linking

IR-BERT @ TREC 2020 News Track (Background Linking)

This repo has the implementation of two methods

Weighted Search Query + BM25 code here
IR-BERT code here

Steps for running the code on your machine

./src/path.cfg
- Ignore the following variables:
  - topics19
  - entities
  - entities19
  - eqrels
- All the background linking related files (The dataset, topics and qrels) go in the path given by "DataPath" variable
Result files are created by both main scripts of two models, IR-BERT and Weighted BM25
These result files in turn can be directly evaluted by using the background linking eval script

Steps to run IR-BERT

Set appropriate paths in src/path.cfg
Run merge.py in wapo/WashingtonPost/data. You will need the files listed in "filenames" in this directory alongside the merge script.
Start elasticsearch server. command: "elasticsearch". (In case of port mismatch check "http.port" in elasticsearch.yml)
Run Preprocess.py
Run IR-BERT.py

Data Processing code here

lower case all the text
stemming and lematization
remove stop words by using "stopwords.txt" as a dictionary of words
filter the articles based on their kicker field

Best performing model: IR-BERT

We propose IR-BERT, which combines the retrieval power of BM25 with the contextual understanding gained through a BERT based model. It has following components
- Elasticsearch BM25
- RAKE for keyword extraction
- Setence BERT for semantic similarity
Our model outperforms the TREC median as well as the highest scoring model of 2018 in terms of the nDCG@5 metric.

If you find this code helpful do cite our arxiv paper

Recommended citation: Deshmukh, Anup Anand, and Udhav Sethi. "IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles." arXiv preprint arXiv:2007.12603 (2020).

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src		src
wapo/WashingtonPost/data		wapo/WashingtonPost/data
.gitignore		.gitignore
README.md		README.md
final.png		final.png
res1.png		res1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

wapo/WashingtonPost/data

wapo/WashingtonPost/data

.gitignore

.gitignore

README.md

README.md

final.png

final.png

res1.png

res1.png

Repository files navigation

IR-BERT @ TREC 2020 News Track (Background Linking)

Steps for running the code on your machine

Steps to run IR-BERT

Data Processing code here

Best performing model: IR-BERT

If you find this code helpful do cite our arxiv paper

Recommended citation: Deshmukh, Anup Anand, and Udhav Sethi. "IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles." arXiv preprint arXiv:2007.12603 (2020).

About

Releases

Packages

Contributors 2

Languages

Anup-Deshmukh/TREC_background_linking

Folders and files

Latest commit

History

Repository files navigation

IR-BERT @ TREC 2020 News Track (Background Linking)

Steps for running the code on your machine

Steps to run IR-BERT

Data Processing code here

Best performing model: IR-BERT

If you find this code helpful do cite our arxiv paper

Recommended citation: Deshmukh, Anup Anand, and Udhav Sethi. "IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles." arXiv preprint arXiv:2007.12603 (2020).

About

Resources

Stars

Watchers

Forks

Languages