Skip to content

This repository shows an approach to address the SemEval 2012 task 6, Semantic Textual Similarity

Notifications You must be signed in to change notification settings

Danfoa/SemEval-2012-task6-project

Repository files navigation

This repository holds an approach to the SemEval2012 competition of task 6, Semantic Textual Similarity. Please note that I did not participate in the original competition and that this assignment is academically driven.

The details on the implementation are displayed in a Jupyter Notebook.

The set of features used in the end models are displayed below in a correlation matrix

image

In case you want to re-compute the features you need to install CoreNLP and configure it as a Server. Additionally, you need to download the Glove 300 model of your preference (download it here) and reference it in the features.py file.

The training dataset was obtained from this repository. The high accuracy obtained in this implementation relies on the fact that this augmented training dataset encapsulates the training sets from the same competition from the year 2012 to 2017.

The resultant models' performance is displayed in the Figure below. Each model uses a subset of the relevant features obtained by hand tunning or recursive feature elimination (or both, more details in the notebook).

image

(BoW): Stands for models using as one of its features the outcome of a regressor model trained only with BoW tf/idf embeddings

About

This repository shows an approach to address the SemEval 2012 task 6, Semantic Textual Similarity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published