CMPUT 651 Project at University of Alberta in Fall 2019
- Ensure
torch
andsklearn
are installed. - Set up the project.
$ pip install -e .
$ pip install torchtext
$ pip install spacy
$ python3 -m spacy download en
- Download pre-trained GloVe word vectors. Place the Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download) word vectors in
/data/glove/
directory. - Download the trial dataset and training dataset, unzip the .zip files and move the image directories in
/data
. - Download the InferSent model trained with GloVe in
/data
.
$ mkdir encoder
$ curl -Lo encoder/infersent1.pkl https://dl.fbaipublicfiles.com/infersent/infersent1.pkl
- Clone the Facebook AI Research Sequence-to-Sequence Toolkit (for PyTorch implementation of RoBERTa) in the project directory.
$ git clone git@github.com:pytorch/fairseq.git
- Download the pretrained RoBERTa model
roberta.large
here. Decompress the file, and place the folder in the project directory.