GitHub - meetleilei/2-CRF-MWE: This is the implementation of the double chained CRF used for predicting MWE and supersenses.

This is the implementation of the double chained CRF used for predicting Multiword Expressions (MWE) and supersenses.

UW-CSE at SemEval-2016 Task 10: Detecting multiword expressions and supersenses using double-chained conditional random fields. Mohammad Javad Hosseini, Noah A. Smith, and Su-In Lee. In Proceedings of the NAACL Workshop on Semantic Evaluations (SemEval 2016), San Diego, CA, June 2016.

We participated at the SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM). Our submitted models ranked first overall in the competition.

We have implemented a Conditional Random Field and a Double-Chained Conditional Random Field model for joint learning of multiword expressions and supersenses.

The feature extraction is based on AMALGrAM 2.0 (A Machine Analyzer of Lexical Groupings And Meanings) and the dependencies are the same as AMALGrAM 2.0.

Software

Python 2.7
Cython (tested on 0.21.1)
NLTK 3.0.2+ with the WordNet resource installed

Running:

After downloading the code, given the above softwares are installed, you can run the code from the scripts folder to replicate the paper's results and/or test on new data. (best model: Double_CRF_open.sh)

Tagging Scheme

Multiword Expressions:

The annotation for MWEs extends the conventional BIO scheme to include gappy MWEs with one level of nesting. Segmentations are represented using six tags; the lower-case variants indicate that an expression is within another MWE’s gap.

-- O and o: single word expression -- B and b: the first word of a MWE -- I and i: a word continuing a MWE

Supersenses:

Each noun or verb expression is also annotated with a supersense; there are 26 supersenses for nouns and 15 for verbs. Only the first word of a MWE receives a supersense tag.

The input must be sentence and word tokenized and part-of-speech tagged (with the Penn Treebank POS tagset).

Please refer to dimsum-data-1.5/TAGSET.md for more details.

Data:

The datasets are in the folder dimsum-data-1.5. There is a readme file in the folder explaining the format. For prediction on new data, input should be formatted as described there. Our original submission is in the folder submitted_results.

Please email the first author (hosseini@cs.washington.edu) in case of any questions and/or requests.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Util		Util
dimsum-data-1.5		dimsum-data-1.5
lex		lex
mwelex		mwelex
scripts		scripts
src		src
streusle-2.0		streusle-2.0
tagsets		tagsets
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Util

Util

dimsum-data-1.5

dimsum-data-1.5

lex

lex

mwelex

mwelex

scripts

scripts

src

src

streusle-2.0

streusle-2.0

tagsets

tagsets

.gitmodules

.gitmodules

README.md

README.md

Repository files navigation

Software

Running:

Tagging Scheme

Multiword Expressions:

Supersenses:

Data:

About

Releases

Packages

Languages

meetleilei/2-CRF-MWE

Folders and files

Latest commit

History

Repository files navigation

Software

Running:

Tagging Scheme

Multiword Expressions:

Supersenses:

Data:

About

Resources

Stars

Watchers

Forks

Languages