GitHub - cisco00/Sentimental-Analysis-on-threat

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
algorithm		algorithm
BernoulliNB.ipynb		BernoulliNB.ipynb
CategoricalNb.ipynb		CategoricalNb.ipynb
ComplementNB.ipynb		ComplementNB.ipynb
GaussianNB.ipynb		GaussianNB.ipynb
MultinomalNB.ipynb		MultinomalNB.ipynb
Readme.ipynb		Readme.ipynb
app.py		app.py
imdb_labelled.txt		imdb_labelled.txt
nltk.txt		nltk.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
test_app.py		test_app.py
utils.py		utils.py

Repository files navigation

Sentimental_Analysis

About
Using Naive Bayes model to analyse Product reviews in other to be able to really ascertain if your product is accepted or not in other to be able to improve on your product . 

Setup
python 3 is installed
Create Virtual Environmet
install jupyter lab
install libraries:
pandas
numpy
nltk
string
Sklearn
Spacy


Import files

import pandas as pd
import numpy as np
import nltk
import string

from nltk import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk import sent_tokenize, word_tokenize

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.naive_bayes import ComplementNB
from sklearn.pipeline import Pipeline

from matplotlib import pyplot as plt

Contents
we use three algorithms from the naive bayes model in other to pick the best alroithm we compare the score of the various algorithm which are:
MultinomialNB(Multinomial Naive Bayes)

With CountVectorizer the metrics that were obtain in this algorithm are as follow; 
Maximum Accurancy: 77.85%, 
Minimum Accurancy: 70.67, 
Mean Accurancy: 7.5e+01%, 
Std accurancy: 2.45%

with Tfidf the metrics that were obrain in this algorithm are 
Maximum accuracy: 82.55%, 
Minimum Accuracy: 75.17%, 
Mean_accuracy: 78.34%, 
Std_accuracy: 2.62%
BernuolliNB(Bernuolli Naive Bayes)

with CountVectorizer the metrics that were achieved are 
Maximum accuracy: 79.19%, 
Minimum Accuracy: 71.33%, 
Mean_accuracy: 75.13%, 
Std_accuracy: 3.08%

With Tfidf the metrics that where obtained in this algorithm are; 
Maximum accuracy: 79.19%, 
Minimum Accuracy: 71.33%, 
Mean_accuracy: 75.13%, 
Std_accuracy: 3.08%

ComplementNB((Multinomial Naive Bayes))
with CountVectorizer the metrics that were obtained are 
Maximum Accurancy: 78.67%, 
Minimum Accurancy: 70.67%, 
Mean Accurancy: 7.4e+01%, 
Std accurancy: 2.55%

With Tfidf the metrics that were obtain are 
Maximum Accurancy: 81.21%, 
Minimum Accurancy: 75.17%, 
Mean Accurancy: 7.8e+01%, 
Std accurancy: 2.13%
From the above model we can see that Complement naive bayes has the highest accurancy compare to the other which goes to show that Complement Naive Bayes is more able to predict better than the other. So in that case we are going to be working with the complement naive bayes to check if a text is Positive or Negative

Reference
https://github.com/Semicolon-Tech/sentiment_analysis/blob/main/README.md