Tran Le, Stella (Seoyeon) Lee Grinnell College CSC 395 Information Retrieval
This repository contains the implementation of question-answering model which primary goals to understand sequential context.
Dependency
sys
os
pickle
csv
numpy
time
re
tensorflow 2.0.0
tensorflow.keras
This folder contains fetch_glove.sh
and get_bAbi.sh
which is used to download GloVe
pre-trained word embeddings and Facebook bAbi dataset.
- preprocessing contains python files to preprocess bAbi dataset
- baseline_model contains python files to train/test lstm models
- model contains python files of custom keras layers and models
- train.py is a python file used for training our custom QA models
get_glove.py
contains functions to loadGloVe
embeddings and embedding matrixload_vectors(filepath)
: LoadGloVe
text file to dictionary of{word:embedding}
load_embedding_matrix(data_folder, dims)
: Load embedding vectors to an embedding matrix and create word-index mapping
process_bAbi.py
contains functions to save each of tasks into Context, Question, and Answerpreprocessing.py
transform(input, max_len, tokenizer)
: Returns padded sequence that is transformed frominput
bytokenizer.texts_to_sequences
main(dim, embedding_folder, data_folder)
saves the embedding matrix and tokenizer