Skip to content

akshat2412/naive-bayes-classifier-

Repository files navigation

Document Classification

We developed a document classifier web service in which a document is uploaded and is classified using naive bayes classifier.

Dataset

We used 20news_group dataset for our classification.

20 newsgroup dataset

Workflow

  1. Dataset was combined into one file.
  2. Stopwords were removed.
  3. Lemmatization was performed on the datasets.
  4. Probabilies of each word against each category was calculated and stored in the database.
  5. Classification was performed based on the probabilities(naive bayes theorem)

Results

Bag of Words

Probabilities of each word was calculated

Website and Document Upload

Classification