MSMARCO with S-NET extraction

A CNTK(Microsoft deep learning toolkit) implementation of S-NET: FROM ANSWER EXTRACTION TO ANSWER GENERATION FOR MACHINE READING COMPREHENSION with some modifications.
This project is designed for the MSMARCO dataset
Code structure is based on CNTK BIDAF Example
Support MSMARCO V1 and V2!

Requirements

Here are some required libraries for training.

General

python3.6
cuda-9.0 (CNTK required)
openmpi-1.10 (CNTK required)
gcc >= 6 (CNTK required)

Python

Please refer requirements.txt

Usage

Preprocess

MSMARCO V1

Download MSMARCO v1 dataset, GloVe embedding.

cd data
python3.6 download.py v1

Convert raw data to tsv format

python3.6 convert_msmarco.py --threads=`nproc`

Convert tsv format to ctf(CNTK input) format and build vocabs dictionary

python3.6 tsv2ctf.py

Generate elmo embedding

sh elmo.sh

MSMARCO V2

Download MSMARCO v2 dataset, GloVe embedding.

cd data
python3.6 download.py v2

Convert raw data to tsv format

python3.6 convert_msmarco.py --threads=`nproc` --ratio=0.1

Convert tsv format to ctf(CNTK input) format and build vocabs dictionary

python3.6 tsv2ctf.py

Generate elmo embedding

sh elmo.sh

Train (Same for V1 and V2)

cd ../script
mkdir log
sh run.sh

Evaluate develop dataset

MSMARCO V1

cd Evaluation
sh eval.sh v1

MSMARCO v2

cd Evaluation
sh eval.sh v2

Performance

Paper

	rouge-l	bleu_1
S-Net (Extraction)	41.45	44.08
S-Net (Extraction, Ensemble)	42.92	44.97

This implementation

	rouge-l	bleu_1
MSMARCO v1 w/o elmo	38.43	39.14
MSMARCO v1 w/ elmo	39.42	39.47
MSMARCO v2 w/ elmo	43.66	44.44

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Evaluation		Evaluation
data		data
script		script
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation

Evaluation

data

data

script

script

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

MSMARCO with S-NET extraction

Requirements

General

Python

Usage

Preprocess

MSMARCO V1

MSMARCO V2

Train (Same for V1 and V2)

Evaluate develop dataset

MSMARCO V1

MSMARCO v2

Performance

Paper

This implementation

TODO

About

Releases

Packages

Languages

shubhampachori12110095/MSMARCO

Folders and files

Latest commit

History

Repository files navigation

MSMARCO with S-NET extraction

Requirements

General

Python

Usage

Preprocess

MSMARCO V1

MSMARCO V2

Train (Same for V1 and V2)

Evaluate develop dataset

MSMARCO V1

MSMARCO v2

Performance

Paper

This implementation

TODO

About

Resources

Stars

Watchers

Forks

Languages