Skip to content

This is a summarization of representative methods in text classification using by keras and scikit-learn.

License

Notifications You must be signed in to change notification settings

guokeda/text_classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Classification

Text classification is an important theme and basic classification task in natural language processing and machine learning.
There are many methods to classify texts. Recently classification method by deep learning has been invented.
I will introduce classification by CNN and LSTM which is a representative deep learning.
And also summarized svm and naive bayes which are classic methods.

Run

python execute.py --method naive_bayes --dataset yelp_review_polarity

Experiments

Classifiers

Classifier Link
CNN Paper
LSTM Keras
Character Level CNN Paper
SVM scikit-learn
Naive Bayes scikit-learn

Result

AG's News

  • Classes: 4
  • Train Data Size: 120,000
  • Test DataSize: 7,600
Classifier validation loss validation accuracy
CNN 0.2994 0.9055
LSTM 0.2587 0.9106
Character Level CNN 0.3692 0.8709
SVM - 0.9007
Naive Bayes - 0.9182

About

This is a summarization of representative methods in text classification using by keras and scikit-learn.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%