Official code for the paper "Deep Contextualized Self-training for Low Resource Dependency Parsing".
If you use this code please cite our paper.
Simply run:
- Python 3.7
- Pytorch 1.1.0
- Cuda 10.0
pip install -r requirements.txt
Preprocessed in note
format. Data folder can be obtained from here.
Set the word_path="./data/morph.word.200.vec"
and char_path="./data/morph.char.30.vec"
Embeddings can be found here
Possible word embedding option: ['random', 'fasttext']
The multilingual word embedding (.vec extensions) should be placed under the data/multilingual_word_embeddings
folder.
In order to run the low resource in-domain experiments there are three steps we need to follow:
- Running the base Biaffine parser
- Running the sequence tagger(s)
- Running the combined DCST parser
Create saved_model
empty folder to store the new models.
If you want to run complete model then simply run bash script run_dcsh.sh
otherwise
Refer to corrsoponding section in run_dcsh.sh
to run corrsopnding segments.
- Without POS Tag : Don't use flag
--use_pos
for all stages, namely, base model, auxiliary tasks, Final ensembled model. - With Coarse level Tag : Use the input files from
data
folder from--use_pos
flag here - With POS level Tag : Shuffle 2nd and 3rd column of all the files in
data
folder.
Add pretrained model file from Here to path ./utils/morph_tagger/cwlm_lstm_crf_cas_2.model
- Pretrained morph tagger layer : Run the
run_dcst.sh
. - Pretrained morph tagger layer with freezing: comment this and then run
run_dcst.sh