Skip to content

A Tensorflow implementation of Fast and Accurate Reading Comprehension without Recurrent Neural Networks

License

Notifications You must be signed in to change notification settings

3DMM-ICME2023/Reading-Comprehension-without-RNNs

 
 

Repository files navigation

FAST AND ACCURATE READING COMPREHENSION WITHOUT RECURRENT NETWORKS

A Tensorflow implementation of Google's Fast Reading Comprehension from ICLR2018. Without RNNs the model computes relatively quickly compared to R-net(about 5 times faster in naive implementation). After 12 epochs of training our model reaches dev EM/F1 = 57 / 72.

Alt text

Dataset

The dataset used for this task is Stanford Question Answering Dataset. Pretrained GloVe embeddings obtained from common crawl with 840B tokens are used for words.

Requirements

  • Python2.7
  • NumPy
  • tqdm
  • TensorFlow (1.2 or higher)
  • spacy

Downloads and Setup

Preprocessing step is identical to R-net. Once you clone this repo, run the following lines from bash just once to process the dataset (SQuAD).

$ pip install -r requirements.txt
$ bash setup.sh
$ python process.py --process True --reduce_glove True

Training / Testing / Debugging / Demo

You can change the hyperparameters from params.py file to fit the model in your GPU. To train the model, run the following line. To test or debug your model after training, change mode = "train" from params.py file and run the model.

$ python model.py

A working realtime demo is available at demo.py. To use web interface for live demo change use mode = "demo" and set batch_size to 1. (The code is taken from R-net)

TODO's

  • Training and testing the model
  • Add trilinear function to Context-to-Query attention
  • Convergence testing
  • Apply dropouts + stochastic depth dropout
  • Realtime Demo
  • Query-to-context attention
  • Data augmentation by paraphrasing

Tensorboard

Run tensorboard for visualisation.

$ tensorboard --logdir=./

Alt text

Note

2/02/18 The model quickly reaches EM/F1 = 55/69 on devset, but never gets beyond that even with strong regularization. Also the training speed (1.8 batch per second in GTX1080) is slower than the paper suggests (3.2 batch per second in P100).

28/01/18 The model reaches devset performance of EM/F1=44/58 1 hour into training without dropout. Next goal is to train with dropout every 2 layers.

04/11/17 Currently the model is not optimized and there is a memory leak so I strongly suggest only training if your memory is 16GB >. Also I haven't done convergence testing yet. The training time is 5 ~ 6x faster on naive implementation compared to R-net.

About

A Tensorflow implementation of Fast and Accurate Reading Comprehension without Recurrent Neural Networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • HTML 2.4%
  • Shell 0.4%