TemporalReferencing

Data and code for the experiments in

Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi and Dominik Schlechtweg. 2019. Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Florence, Italy. Association for Computational Linguistics.

If you use this software for academic research, please cite the paper above and make sure you give appropriate credit to the below-mentioned software this repository strongly depends on.

The code heavily relies on DISSECT (modules/composes). For aligning embeddings (SGNS) we used VecMap (alignment/map_embeddings). For alignment of the PPMI matrices and measuring cosine distance we relied on code from LSCDetection. We used hyperwords for training SGNS and PPMI on the extracted word-context pairs.

Usage Note

The scripts should be run directly from the main directory. If you wish to do otherwise, you may have to change the path you add to the path attribute in sys.path.append('./modules/') in the scripts. All scripts can be run directly from the command line, e.g.:

python corpus_processing/extract-pairs.py <windowSize> <corpDir> <outDir> <lowerBound> <upperBound> <testset> <freqset> <minfreq>

We recommend you to run the scripts with Python 2.7.15, only for VecMap Python 3 is needed. You will have to install some additional packages such as: docopt, gensim, i.a. Those that aren't available from the Anaconda installer can be installed via EasyInstall, or by running pip install -r requirements.txt.

Pipeline

In scripts/run_test.sh you find an example of a full pipeline for the models on a small test corpus. Assuming you are working on a UNIX-based system, first make the script executable with

chmod 755 scripts/run_test.sh

Then run it with

bash -e scripts/run_test.sh

Make sure matrices/ is empty each time you run it. You can take a look at the code to understand how it makes use of the different scripts in the repository. It first reads the gzipped test corpus in corpora/test/testcorpus.gz containing many duplicate sentences for the time bins 1920, 1930 and 1940 with each line in the following format:

year [tab] word1 word2 word3...

It then extracts

regular word-context pairs for each time bin (for alignment) and
temporally referenced word-context pairs for specified target words (target_year)

with corpus_processing/extract-pairs.py. Then it creates basic matrices for PPMI and SGNS using the scripts under hyperwords/ and aligns the time-binned matrices with the scripts under alignment/. Finally, it extracts cosine distances (displacement) for each pair of adjacent time bins and nearest neighbors for each time bin to results/ using the scripts under measures/. The script measures/displacement.py needs a different input for each model: for the alignment models it takes the file testsets/test/testset-pairs.csv in the following format:

target1 [tab] target1
target2 [tab] target2

For Temporal Referencing it takes the files testsets/test/testset-1920-1930-pairs.csv and testsets/test/testset-1930-1940-pairs.csv as input with the following respective formats:

target1_year1 [tab] target1_year2
target2_year1 [tab] target2_year2

and

target1_year2 [tab] target1_year3
target2_year2 [tab] target2_year3

Data

Under data/ we give a spreadsheet with experimental results on the Word Sense Change testset and lists of the nearest neighbors for the test words found by the different models when trained on COHA (1920-1970).

BibTex

@inproceedings{Dubossarskyetal19,
title = {{Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change}},
author = {Dubossarsky, Haim and Hengchen, Simon and Tahmasebi, Nina and Schlechtweg, Dominik},
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment

alignment

corpora/test

corpora/test

corpus_processing

corpus_processing

data

data

hyperwords

hyperwords

measures

measures

modules

modules

scripts

scripts

space_creation

space_creation

testsets/test

testsets/test

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

TemporalReferencing

Usage Note

Pipeline

Data

BibTex

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
alignment		alignment
corpora/test		corpora/test
corpus_processing		corpus_processing
data		data
hyperwords		hyperwords
measures		measures
modules		modules
scripts		scripts
space_creation		space_creation
testsets/test		testsets/test
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

License

lzjtt2017/TemporalReferencing

Folders and files

Latest commit

History

Repository files navigation

TemporalReferencing

Usage Note

Pipeline

Data

BibTex

About

Resources

License

Stars

Watchers

Forks

Languages