Skip to content

vic85821/dialogue_model_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dialogue Model using NLP

This project is about the dialogue QA. There would be some utterances and what we need to do is select the best answer from the 100 candidates. There are three models implemented, which are RNN w/o attention, RNN w/ attention, and the model with the best performance. In the report, I list some comparison between differenct RNN models (e.g. LSTM, GRU), and the implementation details.

Requirements

    conda env create -f nlp.yml

Training

Prepare data

So there are these files in the data folder as follow:

    ./data/config.json # config setting
    ./data/train.json # training data
    ./data/valid.json # validation data
    ./data/test.json # testing data
    ./data/crawl-300d-2M.vec # english word vectors

Train the model

  • prepare the models folder
  • create experiment folder, e.g. lstm
  • add the config.json which contains the experiment settings into the experiment folder
    ./models/lstm/config.json

run the training process

    cd src
    bash preprocess.sh # preprocess the json to pickle 
    bash train.sh model_path cuda_device

Pre-trained model

Use gdrive package (https://github.com/gdrive-org/gdrive) to download the pre-trained model

    bash download.sh

Testing

bash rnn.sh/attention.sh/best.sh ${1} ${2}

  • ${1} path_to_the_test_json
  • ${2} path_to_the_predictions

Attention Score Plot

  • there should be a best folder in the models
  • need to preprocess the data to the pkl format
  • need to prepare embedding.pkl which contains the englist word embedding info
    cd src
    python visual.py data_path, embed_path
    
    # example
    python visual.py ../data/valid.pkl ./embedding.pkl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published