This is a school project I did in the course "Data Mining and Data Warehousing"
This is a desktop application of a generic Classifier based on "Naive Bayes" algorithm using m-estimator (m=2) https://en.wikipedia.org/wiki/Naive_Bayes_classifier
- Constructing the structure of the model using Structure.txt file.
- Data Pre-processing: Data Cleaning: Fill in missing values, Identify outliers and smooth out noisy data (using the Equal-width Partitioning Discretization Method) , Correct inconsistent data.
- Loading the train set
- Building the classifier using the train set
- Loading the test set
- Classifying the records with Naive Bayes classifier using m-estimator (m=2)
This project includes data files to test the classifier with:
- Dataset general info.txt - general information about the data base from which the data is taken
- Structure.txt - Description of the data set attributes.
- train.csv - the train set
- test.csv - the test set
Install Python 2.7 (Since the project uses pandas library, best to use Anaconda Distribution) can download here: https://www.anaconda.com/download/
python Prog.py
- Browse the directory with the Structure.txt , train.csv and test.cxs files
- Type the desired number of Discretization Bins
- Click Build
- Click Classify
The classification results will be outputed to output.txt