GitHub - praktate27/housing-price-ml

#Goal: This project aims at predicting house prices (residential) in Ames, Iowa, USA.

#Dataset: Kaggle's Housing Data Set Knowledge Competition

#Factors that affect House Pricing:

Area of House
How old is the house
Location of the house
How close/far is the market
Connectivity of house location with transport
How many floors does the house have
What material is used in the construction
Water /Electricity availability
Play area / parks for kids (if any)
If terrace is available
If car parking is available
If security is available

#Data Exploration:

Output of Dataset shape

The train data has 1460 rows and 81 columns

The test data has 1459 rows and 80 columns

Missing Columns

['LotFrontage', 'Alley', 'MasVnrType', 'MasVnrArea', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Electrical', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']

Correlation score of columns with Sale Price

SalePrice --> 1.000000 OverallQual --> 0.790982 GrLivArea --> 0.708624 GarageCars --> 0.640409 GarageArea --> 0.623431 TotalBsmtSF --> 0.613581 1stFlrSF --> 0.605852 FullBath --> 0.560664 TotRmsAbvGrd--> 0.533723 YearBuilt --> 0.522897 YearRemodAdd--> 0.507101 GarageYrBlt--> 0.486362 MasVnrArea --> 0.477493 Fireplaces --> 0.466929 BsmtFinSF1 --> 0.386420 Name: SalePrice, dtype: float64, '\n')

YrSold --> -0.028923 OverallCond --> -0.077856 MSSubClass --> -0.084284 EnclosedPorch --> -0.128578 KitchenAbvGr --> -0.135907 Name: SalePrice, dtype: float64

Data Pre-processing
Feature Engineering

Most categorical variables have near-zero variance distribution. Near-zero variance distribution is when one of the categories in a variable has >90% of the values. We'll create some binary variables depicting the presence or absence of a category. The new features will contain 0 or 1 values.

Model training and Evaluation:

Here we are using 3 Algorithms, XGBoost, Neural Network, Lasso Regression. We did RMSE on models and got following output on Kaggle score board:

XGBoost : 0.12507
Lasso Regression : 0.11859
Neural Network : 1.35346

which means Lasso Regression is best fitted for our predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
Screenshot 2018-11-29 at 8.10.37 AM.png		Screenshot 2018-11-29 at 8.10.37 AM.png
Screenshot 2018-11-29 at 8.10.48 AM.png		Screenshot 2018-11-29 at 8.10.48 AM.png
bar_plot_of_missing_values.png		bar_plot_of_missing_values.png
correlation_map_of_columns_with_saleprice.png		correlation_map_of_columns_with_saleprice.png
housing_price.py		housing_price.py
jointplot.png		jointplot.png
log_transformed_skew_distribution_sale_price.png		log_transformed_skew_distribution_sale_price.png
overall_qual_bar_graph.png		overall_qual_bar_graph.png
right_skewed_distribution_of_sale_price.png		right_skewed_distribution_of_sale_price.png
sample_submission.csv		sample_submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Screenshot 2018-11-29 at 8.10.37 AM.png

Screenshot 2018-11-29 at 8.10.37 AM.png

Screenshot 2018-11-29 at 8.10.48 AM.png

Screenshot 2018-11-29 at 8.10.48 AM.png

bar_plot_of_missing_values.png

bar_plot_of_missing_values.png

correlation_map_of_columns_with_saleprice.png

correlation_map_of_columns_with_saleprice.png

housing_price.py

housing_price.py

jointplot.png

jointplot.png

log_transformed_skew_distribution_sale_price.png

log_transformed_skew_distribution_sale_price.png

overall_qual_bar_graph.png

overall_qual_bar_graph.png

right_skewed_distribution_of_sale_price.png

right_skewed_distribution_of_sale_price.png

sample_submission.csv

sample_submission.csv

test.csv

test.csv

train.csv

train.csv

Repository files navigation

The train data has 1460 rows and 81 columns

About

Releases

Packages

Languages

praktate27/housing-price-ml

Folders and files

Latest commit

History

Repository files navigation

The train data has 1460 rows and 81 columns

About

Resources

Stars

Watchers

Forks

Languages