top of page
Writer's pictureAaditya Bansal

California Housing Price Prediction - 5. Deployment

Housing Price Prediction


Welcome to my very first Machine Learning / Data Science Project.


This post is in continuation of the Part 1 - Data Extraction of this Project , Part-2 EDA and Visualization ,Part 3 - Preprocessing and Part 4 - Modeling please check them out if haven't already.


I will be sharing the process and updates using blogs.


In this Blog Post I have detailed the Overview and focused on last but not least part of a Machine Learning / Data Science Project: Deployment!!


 

Overview

This Project Notebook covers all the necessary steps to complete the Machine Learning Task of Predicting the Housing Prices on California Housing Dataset available on scikit-learn. We will perform the following steps for successfully creating a model for house price prediction:

1. Data Extraction (See Details in Previous Blog)

  • Import libraries

  • Import Dataset from scikit-learn

  • Understanding the given Description of Data and the problem Statement

  • Take a look at different Inputs and details available with dataset.

  • Storing the obtained dataset into a Pandas Data frame

2. EDA (Exploratory Data Analysis) and Visualization (See Details in Previous Blog)

  • Getting a closer Look at obtained Data

  • Exploring different Statistics of the Data (Summary and Distributions)

  • Looking at Correlations (between indiviual features and between Input features and Target)

  • Geospatial Data / Coordinates - Longitude and Lattitude features

3. Preprocessing (See Details in Previous Blog)

  • Dealing with Duplicate and Null (NaN) values

  • Dealing with Categorical features (e.g. Dummy coding)

  • Dealing with Outlier values

    • Visualization (Box-Plots)

    • Using IQR

    • Using Z-Score

  • Seperating Target and Input Features

  • Target feature Normalization (Plots and Tests)

  • Splitting Dataset into train and test sets

  • Feature Scaling (Feature Transformation)

  • Specifying Evaluation Metric R squared (using Cross-Validation)

  • Model Training - trying multiple models and hyperparameters:

    • Linear Regression

    • Polynomial Regression

    • Ridge Regression

    • Decision Trees Regressor

    • Random Forests Regressor

    • Gradient Boosted Regressor

    • eXtreme Gradient Boosting (XGBoost) Regressor

    • Support Vector Regressor

  • Model Selection (by comparing evaluation metrics)

  • Learn Feature Importance and Relations

  • Prediction

 

5. Deployment

 

Exporting the trained model to be used for later predictions. (by storing model object as byte file - Pickling)] pickle is a serialized format file - to be deployed on web servers pickle.dump(gradient_boosting_model, open("gradient_boosting_model.pkl", 'wb') ) #here wb = write byte Load and use the pickle file (model object) pickled_model = pickle.load(open("gradient_boosting_model.pkl", "rb")) #here rb = read byte making a pridiction using the loaded pickled_model new_data = scaler.transform(cal_housing_dataset.data.loc[0].values.reshape(1, -1)) new_data

Output: array([[ 2.35209557, 0.98489275, 0.5903677 , -0.14943682, -0.98178766, -0.04651314, 1.0518404 , -1.32288385]]) ## Prediction pickled_model.predict(new_data)

Output: array([4.50939797]) the prediction is same as one made before


We can Download and use these trained model and scaler for deployment on the server.

 

Thank you for your time!! The following is the file with progress of the project until now.

 

Did you like my Notebook and my approach??

  • Yes, Absolutely 🤩

  • Nice Try 😅

  • No, can improve a lot 👀





Comments


bottom of page