Discourse Senser

Sense disambiguation of discourse connectives for PDTB-style shallow discourse parsing.

Description

This package provides core functionality for sense disambiguation of explicit and implicit discourse connectives for PDTB-like discourse parsing. It has been created for the CoNLL-2016 shared task.

The main package dsenser currently comprises the following classifiers which can be trained either separately or bundled into ensembles:

dsenser.major.MajorSenser: a simplistic classifier which returns the conditional probabilities of senses given the connective;
dsenser.wang.WangSenser: an optimized reimplementation of Wang et al.'s sense classification system using the LinearSVC classifier;
dsenser.xgboost.XGBoostSenser: an optimized reimplementation of Wang et al.'s sense classification system using the XGBoost decision forrest classifier;
dsenser.svd.SVDSenser: a neural network classifier which uses the SVD decomposition of word embedding matrices of the arguments;
dsenser.lstm.LSTMSenser: a neural network classifier which uses an LSTM recurrence with Bayesian dropout (cf. Yarin Gal, 2016);

Installation

To install this package, you need to checkout this git-project with its submodules by subsequently running the following commands:

git clone git@github.com:WladimirSidorenko/DiscourseSenser.git
cd DiscourseSenser
git submodule init
git submodule update

# download the Skip-gram Neural Word Embeddings from
# https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing
# and store the unpacked archive at
# `dsenser/data/GoogleNews-vectors-negative300.bin`

pip install -r requirements.txt -e . --user

Note that this package does not include any pre-trained models. Due to a big size of the serialized files, we cannot add them all to the git project and default source distribution, but feel free to contact the author of this program to obtain the PDTB models from him directly. Some time later, we are going to upload these models separately on another location.

Usage

After installation, you can import the module in your python scripts, e.g.:

from dsenser import DiscoureSenser

...

senser = DiscoureSenser(None)
senser.train(train_set, dsenser.WANG | dsenser.XGBOOST | dsenser.LSTM,
             path_to_model, dev_set)

or, alternatively, you can also use the delivered front-end script pdtb_senser to process your input data, e.g.:

pdtb_senser train --type=2 --type=8 path/to/train_dir

pdtb_senser test path/to/input_dir path/to/output_dir

The data in the specified folders should be in the ConNLL JSON format, and include the files parses.json and relations.json for training, and parses.json and relations-no-senses.json for the testing mode. Alternatively, you can also specify a different input relations file whose senses need to be predicted by using the option pdtb_senser test --rel-file=REL_FILE INPUT_DIR OUTPUT_DIR.

Acknowledgment

We gratefuly acknowledge the contribution of

Tatjana Scheffler who extended the original features of Wang et al.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
docs		docs
dsenser		dsenser
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
doc-requirements.txt		doc-requirements.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt

License

StevenLOL/DiscourseSenser

Folders and files

Latest commit

History

Repository files navigation

Discourse Senser

Description

Installation

Usage

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Languages