Skip to content

ercanse/NewsClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsClassification

What is it about?

This is a project aimed at training and evaluating various classifiers on news articles collected from NU.nl, to then predict their popularity as expressed in number of comments.

What are its components?

crawling contains a script to collect articles from the news site, save them to a database, and update them with the number of comments they have received.
preprocessing contains a script for preprocessing all text in the collected articles.
learning contains scripts to transform the collected data into input for the classifiers, and a script to train and evaluate classifiers on the data.

What about results?

Currently, when trained on a thousand articles, the multinomial Naive Bayes classifier can classify 50% of the articles correctly while the linear Support Vector Machine scores around 48%.

What next?

Some of the ideas for trying to improve classification performance are:

  • Collecting more data
  • Applying feature selection
  • Investigating the effects of training the classifiers with different parameters

Further details?

See the wiki.

About

Predicting popularity for news articles using machine learning techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages