Resources, exercises and projects from the 'Applied ML and Data Science with Python' course (March, 2020) at Emory University (tought by Sridhar Palle, Ph.D spalle@emory.edu)
These are primarily Jupyter notebooks using Anacaonda.
We will use a marketing/banking dataset obtained from the UCI Machine Learning repository - https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
The dataset is related to phone call marketing campaigns of Portugese banking institutions.
The goal is to find the most accurate model that predicts whether the client will subsribe to a term deposit or not. The target variable, y is a yes/no.
We will use sklearn for
- pre-processing,
- splitting data for train/test
- comparing 4 models (DummyClassifier, LogisticRegression, DecisionTreeClassifier, RandomForestClassifier)
- comparing metrics (confusion matrix, accuracy, recall, f1, precision, auc)
-
numpy.ipynb
A walkthrough of common numpy features
-
pandas.ipynb
A walkthrough of common pandas features
-
numpy_pandas_in_practice.ipynb
A few samples of what numpy and pandas can do
-
sml-classification-exercise.ipynb
Classification of diabetes dataset
-
sml-classification.ipynb
Explore a breast cancer dataset with sklearn and supervised ML
-
sml-classification.ipynb
Explore a sklearn diabetes dataset with regression.