Department of Computer Science
University of Illinois at Urbana-Champaign (UIUC)
Ahmed El-Kishky
Chia-Wei, Chen (Jack Chen)
Daniel You
(1) Tuning model for KBP dataset, use indicator as replacement of using another stream (mention)
(2) Get some result on KBP dataset
put smaller.tsv into directory data/
$ pip install -r requirements.txt
$ ./prepare_corpus.sh
$ ./generate_pickle.sh
$ python src/train.py
- Testing and tuning the model on KBP dataset
- Incorporate subword embedding (mention = Average_Pooling on its corresponding subwords)
- Write a supervisor module for easier and faster experiment
- Clean up the workspace and make it more understandable
- Documentation on all methods
- Documentation on separate works
- Use utf-8 encoding for future support multi-languages
- Revise the input arguments of separate modules to achieve better modularity
- Merge some modules or modulize some frequently used methods
- Remove LateX commmands in volcabulary list
- Add scipts for processing the dataset
- Pipelining the processes
- (TBD)
- Unstable training when applying same pretrained embedding (word2vec/FastText) on both mention and context (loss stuck in local minimum)
- Volcabulary list generation threading issue: Unknown cause to join threads
- Parametrize some hyperparameters shared throughout the entire work
- Bugs on displaying tqdm, consider to write progressbar manually
- Better threading code structure