Skip to content

A supervised classification model that can predict what community a Reddit comment came from

Notifications You must be signed in to change notification settings

JimShu716/Reddit-Comments-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Project

Investigation of feature extraction and model selection in Natural Language Processing

Contributors:

Hao Shu, Gengyi Sun, Han Zhou

Abstract:

In this project, we investigated approaches to accomplish natural language processing and classification. A data set with 100000 comments extracted from Reddit is used to train and validate the model accuracy. Some Python classes were used to help in the feature extraction and model building. Uni-gram Bag-of-words are used as the feature pattern when extracting the features from corpse. After investigation of various classifiers, we have constructed an ensemble of classifiers that’s able to classify comments that acquired from a limited subreddits in Reddit. The result of prediction maintains over 58.7%, with no additional data required. And we achieved the 9th among 98 teams in Kaggle Competition. Classifiers which has a outstanding performance in the classification of matrix produced from natural language has been recorded.

Report available: here

About

A supervised classification model that can predict what community a Reddit comment came from

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages