This repository hosts all my Udacity Data Analyst Nanodegree projects.
A Python project to find out whether there is statistical difference of NYC subway ridership when it rains and when it does not. It involves the usage of some basic statistical tests, linear regression and visualization techniques.
A Python+MongoDB project to clean up a part of the OpenStreetMap data. The region used in this project is Hong Kong.
A project using R to do some basic exploratory data analysis on a red wine dataset.
A machine learning project using Python to build a model from existing Enron Fraud training dataset such that we can predict if an individual was a "Person of Interest" (POI) in the fraud by his/her financial and email data.
A HTML+D3.js project to explore a dataset containing 113,937 loan records from Prosper. The visualization focuses on the relationship between the original loan amount, state of address of the borrowers and the year of original loan.
A project that demonstrates different considerations when doing a A/B testing, including invariant and evaluation metrics selection, sizing and power, sanity checks, effect size tests and sign tests.