This project is based on Chapter 2 of a book by Aurelien Geron
The author also provided a github link for the notebook

I was following the notebook, recreating it, and made some annotations for my own understanding. But 95% of the work was following Geron’s. My follow along notebook is here.

Introduction

The dataset we are using is the California Housing Prices dataset based on 1990 California census (see Figure below). We were trying to predict the mean house prices in each district by using regression.

In summary, it is an end to end ML project:

Results:

As seen above, the lowest RMSE is for RandomForest with GridSearch tuning. With this model, the RMSE of the test set is $47.7k

What I learnt

What do i say. I learnt a lot as this is my first time going through the whole process.

Insights:

What I found confusing in this tutorial:

Next step?