Skip to content

dbalchev/paraphrase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Paraphrase identification in Python 3 using scikit-learn

A solution to the Semeval Paraphrase identification task

It's a reimplementation and extension of ASOBEK.

Currently only adds as a feature, the dot product of the sums of word2vec vectors from both tweets and replaces SVC with AdaBoost-ed decision trees.

Currently the performance of the method is unstable, sometimes yielding an F1 score of 0.6903 (beating ASOBEK's 0.674) and sometimes as low as 0.63.

The word2vec database is "pre-trained vectors trained on part of Google News" from the word2vec website.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages