Predicting Amsterdam house / real estate prices using Linear Regression, KNN-, Lasso-, Ridge-, Polynomial-, Support Vector (SVR)-, Decision Tree-, Random Forest-, and Neural Network MLP Regression.
- load Pandas DataFrame containing (Dec-17) housing data retrieved by means of the following scraper, supplemented with longitude and latitude coordinates mapped to zip code (via GeoPy)
- do some simple data exploration / visualisation
- remove non-numeric data, NaNs, outliers and normalise data
- define explanatory variables (surface, rooms, latitude, longitude) and independent variable (price EUR)
- split the data in train and test set for later usage
- find the optimal model parameters using scikit-learn's GridSearchCV
- fit the model using GridSearchCV's optimal parameters
- evaluate estimator performance by means of 10 fold 'shuffled' cross-validation
Results along (Dec-17) Amsterdam house / real estate price data retrieved by means of the following scraper
surface rooms_new zipcode_new price_new latitude longitude
0 138.0 4.0 1060 420000 40.804672 -73.963420
1 130.0 5.0 1087 550000 52.355590 5.000561
2 116.0 5.0 1061 425000 52.373044 4.837568
3 92.0 5.0 1035 349511 52.416895 4.906767
4 127.0 4.0 1013 1050000 52.396789 4.876607
- Random Forest Regression (n_estim=20, max_depth= None, max_feat=4} 0.866
- Polynomial Regression (degrees=4) 0.810
- Decision Tree Regression (max_depth=4, min_samples_leaf=6) 0.737
- Neural Network MLP Regression (layer =[3,3], alpha=5, solv=lbfgs) 0.721
- KNN Regression (n-neighbors = 15) 0.704
- Ordinary Least-Squares Regression: 0.695
- Ridge Regression (alpha = 0.1) 0.695
- Support Vector Regression (kernel='linear', gamma = 0.001, C= 10) 0.690
- Lasso Regression (alpha = 0.25) 0.614