University of Turku in SemEval 2016 Task 10: DiMSUM

This code implements the experiments of the University of Turku IT Department for SemEval 2016 Task 10, Detecting Minimal Semantic Units and their Meanings (DiMSUM).

Experiment Results

Our submitted DiMSUM task results can be found under data/results. The three packages correspond to the three conditions of the DiMSUM task.

Required Libraries and Data

Dependencies

The dependencies are listed in requirements.txt. Except for scikit-learn, other versions of the libraries may also work. The specific version 0.16.0 of scikit-learn is required for our extensions to work with it.

Data files

Data files used in our experiments, a copy of the DiMSUM corpus and our submitted results can be found in the data directory.

NOTE: You cannot run the experiments for tasks 2 and 3 unless you get the YELP academic dataset. To get the dataset first apply for a permission from Yelp. Once you have the permission, download the dataset and use setupYelp.py to build the database, e.g. python setupYelp.py -i yelp_academic_dataset.json.gz.

dimsum-data-1.5: A copy of the DiMSUM 2016 shared task data.
wikipedia: Data files derived from the English Wikipedia, used in our entry for the open condition.
yelp: A placeholder for the YELP database generated from the YELP academic dataset. To access the YELP academic dataset request a license from the YELP website
results: Our submitted results for the DiMSUM challenge.

Running the Experiments

The DiMSUM task consists of three subtasks, 1) the supervised closed condition, 2) the semi-supervised closed condition and 3) the open condition. For more information on these tasks please see the DiMSUM home page. For running our experimental code for the three tasks, please use the following command:

python run.py -e [TASK] -o [OUTPUT] -c ensemble.ExtraTreesClassifier -r "n_estimators=[2];random_state=[1]" --hidden

When running an experiment, replace [OUTPUT] with your own output directory and replace [TASK] with one of the experiments: Task1, Task2, or Task3.

The Experiment System

All experiments can be run using the program run.py. The experimental code uses a three-step system. One or more of these actions can be performed using the command line option --action or --a.

Generating examples for machine learning

Examples are generated using the build action. A class is derived from src.Experiment to define the rules and limits for example generation. Classes for the experiments described in the paper are defined in the file experiments.py.

Classification

Examples are classified using the classify action. A class can be derived from src.Classification to customize the overall approach for classifying the examples. For most experiments, src.Classification can be used as is. For defining the scikit-learn classifier and its parameters, the --classifier and --classifierArguments of run.py can be used.

Analyses

Classified examples can be used for various analyses with the analyse action. Analysis classes can be derived from src.analyse.Analysis to define such analyses, and the --analyses options of run.py can be used to choose which analyses to run for the experiment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

README.md

README.md

init.py

init.py

experiments.py

experiments.py

requirements.txt

requirements.txt

run.py

run.py

setupYelp.py

setupYelp.py

Repository files navigation

University of Turku in SemEval 2016 Task 10: DiMSUM

Experiment Results

Required Libraries and Data

Dependencies

Data files

Running the Experiments

The Experiment System

Generating examples for machine learning

Classification

Analyses

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
experiments.py		experiments.py
requirements.txt		requirements.txt
run.py		run.py
setupYelp.py		setupYelp.py

jbjorne/DiMSUM2016

Folders and files

Latest commit

History

Repository files navigation

University of Turku in SemEval 2016 Task 10: DiMSUM

Experiment Results

Required Libraries and Data

Dependencies

Data files

Running the Experiments

The Experiment System

Generating examples for machine learning

Classification

Analyses

About

Resources

Stars

Watchers

Forks

Languages