Skip to content

Sense Disambiguation of Connectives for PDTB-style Discourse Parsing

License

Notifications You must be signed in to change notification settings

StevenLOL/DiscourseSenser

 
 

Repository files navigation

Discourse Senser

MIT License

image

Sense disambiguation of discourse connectives for PDTB-style shallow discourse parsing.

Description

This package provides core functionality for sense disambiguation of explicit and implicit discourse connectives for PDTB-like discourse parsing. It has been created for the CoNLL-2016 shared task.

The main package dsenser currently comprises the following classifiers which can be trained either separately or bundled into ensembles:

dsenser.major.MajorSenser

a simplistic classifier which returns the conditional probabilities of senses given the connective;

dsenser.wang.WangSenser

an optimized reimplementation of Wang et al.'s sense classification system using the LinearSVC classifier;

dsenser.xgboost.XGBoostSenser

an optimized reimplementation of Wang et al.'s sense classification system using the XGBoost decision forrest classifier;

dsenser.svd.SVDSenser

a neural network classifier which uses the SVD decomposition of word embedding matrices of the arguments;

dsenser.lstm.LSTMSenser

a neural network classifier which uses an LSTM recurrence with Bayesian dropout (cf. Yarin Gal, 2016);

Installation

To install this package, you need to checkout this git-project with its submodules by subsequently running the following commands:

git clone git@github.com:WladimirSidorenko/DiscourseSenser.git
cd DiscourseSenser
git submodule init
git submodule update

# download the Skip-gram Neural Word Embeddings from
# https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing
# and store the unpacked archive at
# `dsenser/data/GoogleNews-vectors-negative300.bin`

pip install -r requirements.txt -e . --user

Note that this package does not include any pre-trained models. Due to a big size of the serialized files, we cannot add them all to the git project and default source distribution, but feel free to contact the author of this program to obtain the PDTB models from him directly. Some time later, we are going to upload these models separately on another location.

Usage

After installation, you can import the module in your python scripts, e.g.:

from dsenser import DiscoureSenser

...

senser = DiscoureSenser(None)
senser.train(train_set, dsenser.WANG | dsenser.XGBOOST | dsenser.LSTM,
             path_to_model, dev_set)

or, alternatively, you can also use the delivered front-end script pdtb_senser to process your input data, e.g.:

pdtb_senser train --type=2 --type=8 path/to/train_dir

pdtb_senser test path/to/input_dir path/to/output_dir

The data in the specified folders should be in the ConNLL JSON format, and include the files parses.json and relations.json for training, and parses.json and relations-no-senses.json for the testing mode. Alternatively, you can also specify a different input relations file whose senses need to be predicted by using the option pdtb_senser test --rel-file=REL_FILE INPUT_DIR OUTPUT_DIR.

Acknowledgment

We gratefuly acknowledge the contribution of

About

Sense Disambiguation of Connectives for PDTB-style Discourse Parsing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%