Machine Learning Project

Investigation of feature extraction and model selection in Natural Language Processing

Contributors:

Hao Shu, Gengyi Sun, Han Zhou

Abstract:

In this project, we investigated approaches to accomplish natural language processing and classification. A data set with 100000 comments extracted from Reddit is used to train and validate the model accuracy. Some Python classes were used to help in the feature extraction and model building. Uni-gram Bag-of-words are used as the feature pattern when extracting the features from corpse. After investigation of various classifiers, we have constructed an ensemble of classifiers that’s able to classify comments that acquired from a limited subreddits in Reddit. The result of prediction maintains over 58.7%, with no additional data required. And we achieved the 9th among 98 teams in Kaggle Competition. Classifiers which has a outstanding performance in the classification of matrix produced from natural language has been recorded.

Report available: here

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
data		data
script		script
README.md		README.md
Report.pdf		Report.pdf
RunAcc.png		RunAcc.png
RunAndScore.png		RunAndScore.png
model.xlsx		model.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

script

script

README.md

README.md

Report.pdf

Report.pdf

RunAcc.png

RunAcc.png

RunAndScore.png

RunAndScore.png

model.xlsx

model.xlsx

Repository files navigation

Machine Learning Project

Investigation of feature extraction and model selection in Natural Language Processing

Contributors:

Abstract:

About

Releases

Packages

Contributors 3

Languages

JimShu716/Reddit-Comments-Classification

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project

Investigation of feature extraction and model selection in Natural Language Processing

Contributors:

Abstract:

About

Resources

Stars

Watchers

Forks

Languages