Final Project for COGS 118A: An Empirical Comparison of Supervised ML Algorithms across various binary classification problems
All taken from Kaggle
- Income dataset
- https://www.kaggle.com/mastmustu/income?select=train.csv
- 14 features
- predictor: income_>50k
- Phishing website detector
- https://www.kaggle.com/eswarchandt/phishing-website-detector?select=phishing.csv
- 31 features
- predictor: 1/-1 phishing website or not
- Airline passenger satisfaction
- https://www.kaggle.com/teejmahal20/airline-passenger-satisfaction?select=test.csv
- 24 features
- predictor: neutral or dissatisfied / satisfied
- Surgical Complications dataset
- https://www.kaggle.com/omnamahshivai/surgical-dataset-binary-classification
- 24 features
- predictor: complication / no complication
Models I will use for hyperparameter search and classification
- Logistic Regression
- SVM
- Random Forest
- Artificial Neural Network
- Accuracy
- F1 score
- AUC
- Precision
- Recall
- Heatmap plots of hyperparameter search results for Logistic Regression, Random Forest, and ANN (SVM has too many hyperparameter combinations over 4 dimensions to show)