Skip to content

Japkeerat/Learning-NLTK

Repository files navigation

Learning-NLTK

This repository contains source codes to some small applications created while learning natural language processing in Python using NLTK(Natural Language Toolkit).

Dependencies

  • SkLearn
  • Scipy
  • nltk (quite obvious)

Files Information

CustomTokenizer.py file contains the code to make a custom tokenizer using PaktSentenceTokenizer which is an unsupervised machine learning model.

PartOfSpeechTagging.py file contains code that is responsible for tagging nouns in the sentences with some predefined classes.

NaiveBayesForMovieReviews.py makes a supervised machine learning model where it uses movie reviews database for learning to classify reviews into positive and negative reviews. This method is highly volatile in accuracy and varies from 60% to 90% for every test run without changing any parameters.

SkLearnNaiveBayes.py uses some more supervised machine learning models present in SkLearn library and combines it with NLTK. Accuracy of models varies from 60% to 75%.

EnsemblingModels.py makes a custom model using all the models made in SkLearnNaiveBayes.py and runs tests on the custom model. Averagely, this gives accuracy of 70%.

Releases

No releases published

Packages

No packages published

Languages