Skip to content

shubhampachori12110095/Question-Answering

 
 

Repository files navigation

Question Answering and Reading Comprehension

=========================================

This package contains code for the Exploration of various deep neural networks for Question Answering.

Running and Implementation details

With different configs for different models in config directory, you will be able to train a model on the SQuAD dataset.

To train a model you should run python train.py name_of_config (no .py extension for config name). Before that you should get the SQuAD data and tokenize it using the tokenize_data function in utils.py file. This file also contains functions to create a vocabulary, compute length coverege, etc.

All the models are in the model directory, from the basic seq2seq model with attention to the Match-LSTM model with pointer networks.

Fuel is being used for the data pipeline. Iterator and Dataset classes for SQuAD and CNN/DailyMotion datasets are available in data.py as well as a Toy Dataset for debugging purposes.

Evaluation extensions for models in lmu_extensions.py are implemented to be used with Blocks' extensions. For seq2seq models average recall, average precision, macro F1, average F1 and exact match accuracies are reported.

Some cool heatmaps of the Match-LSTM model

The heatmap shows the attention paid to each token of the question at each step of encoding the paragraph. Heatmap1

Heatmap2

Accumulated Answer Length Coverge

As you can see the SQuAD dataset contains some answers with high lengths which will evidently make sequence to sequence solutions challenging for this task. Answer length

(hotizontal axis is the length and vertical axis is the acc coverage)

Acknowledgments

We would like to thank the developers of Theano, Blocks and Fuel at MILA for their excellent work.

About

Exploration of various deep neural networks for Question Answering and Reading Comprehension

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%