Skip to content

aronis92/Data-Mining-Techniques-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Mining-Techniques-Project

Problem Definition

This Group Project was implemented as part of the course: Data Mining Techniques offered by the M.Sc in Data Science of the Department of Informatics, School of Sciences, National and Kapodistrian University of Athens.

The problems that this repository is called to solve are:

  • The classification of different articles in their respective categories
  • Duplicates detection
  • Wordcloud creation for each category

Dataset

The dataset that is used can be found in train_set.zip. It contains 12265 articles, both title and content and also the category in which each one belongs.

Processing & Classifiers

In order to apply the classifiers (SVM and Random Forest) the articles were vectorized by using in each time one of the Tf-Idf-Vectorizer, the Single Value Decomposition or the Average Word Vector (Word2Vector). The stop-words were removed and the Snowball Stemmer was applied. Each classifier was used on each vectorization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages