image

Predicting UK Housing Price using Machine Learning Algorithms

Download Paper: Download pdf
Author(s):
Abstract:

The development of reliable predictive algorithm for house price as the housing market is a stand-out among the most involved regarding valuing the price and continues to fluctuate, is constantly a need for socio-economic advancement and welfare of citizen. In this paper, we develop machine learning algorithms for forecasting UK housing Price, and find an optimal algorithm that forecasts housing price accurately on the premises of the presence of many features or covariates. After applying correlation analysis to remove correlated variables in order to avoid multicollinearity, thereby increasing the statistical power, a novel method of using regression analysis to first of all understand and select statistically significant features for the various regions in England based on North South divide is adopted. These features are then used in the machine learning algorithm to further increase the statistical power of the algorithm, increase the level of accuracy for each of them and ultimately increase the predictive values for the algorithms.

The model construction involves 3 stages: 1- correlation analysis to identify and remove correlated variables thereby avoiding multicollinearity and increasing the statistical power of the linear regression, 2 - using linear regression to determine variables that are statistically significant and 3 - building the machine learning algorithms based on the variables that are statistically significant from the linear regression. A comprehensive dataset of UK Paid housing Price from 2010 to 2019 was linked to a number of other datasets to generate a total 21 variables or features used for the models. Catboost, Gradient Boosting, Bagging, Random Forest, Extra Tree all achieved the excellent model’s performance result in all the regions considered. The comparison of the seven models showed that Extra Tree algorithm consistently achieved the best performance in term of level of accuracy in all the regions. K-Nearest Neighbours (KNN) is the only algorithm with less than 50% level of accuracy. Noticeably, the regions considered had varying or differing insignificant variables, implying that although many variables are common (statistically significant) to all the regions, there are regional differences and impact when modelling or predicting housing prices. This study validates the practicability of developing a machine learning methodology for the prediction of housing price. This research offers a reference for future house price prediction based on machine learning.


© 2024 The Author(s). Published by RITHA Publishing. This article is distributed under the terms of the license CC-BY 4.0., which permits any further distribution in any medium, provided the original work is properly cited.


How to cite:

Ogundeji, G. A., Pitts, D. A., Sun, Y., & Ghafoor, M. (2024). Predicting UK Housing Price using Machine Learning Algorithms. Journal of Research, Innovation and Technologies, Volume III, 1(5), 67-85. https://doi.org/10.57017/jorit.v3.1(5).05 


References:

[1] Agatonovic-Kustrin, S., & Beresford, R. (2000). Basic concepts of artificial neural network (ANN) modelling and its application in pharmaceutical research, Journal of Pharmaceutical and Biomedical Analysis, 22(5), 717-27. https://doi.org.10.1016/s0731-7085(99)00272-1 

[2] Antoniucci, V. & Marella, G. (2017). Immigrants and the City: The Relevance of Immigration on Housing Price Gradient Buildings, 7(4), 91; https://doi.org/10.3390/buildings7040091 

[3] Awonaike A. et al. (2022). A Machine Learning Framework for House Price Estimation, Journal of Network and Innovative Computing, Volume 10, 028-035. www.mirlabs.net/jnic/index.htm 

[4] Band, A. (2020). How to find the optimal value of K in KNN? https://towardsdatascience.com/how-to-find-the-optimal-value-of-k-in-knn-35d936e554eb#:~:text=The%20optimal%20K%20value%20usually,be%20aware %20of%20the%20outliers

[5] Brownlee, J. (2021). How to Develop an Extra Trees Ensemble with Python. https://machinelearning mastery.com/extra-trees-ensemble-with-python

[6] CFI Team (2022). Bagging (Bootstrap Aggregation). https://corporatefinanceinstitute.com/resources/data-science/bagging-bootstrap-ggregation/

[7] Cook, S. & Watson, D. (2016). A New Perspective on the Ripple Effect in the UK Housing Market: Comovement, Cyclical Subsamples and Alternative Indices. Urban Studies Urban Studies Journal Limited: 53(14), 3048-3062. https://doi.org/10.1177/0042098015610482

[8] Cover, T. M. & Hart, P. E. (1967). Nearest Neighbour Pattern Classification. IEEE Transaction in Information Theory, 13 21-27. https://doi.org/10.1109/TIT.1967.1053964

[9] Cover, T. M. (1968). Rates of Convergence for Nearest neighbour procedures. In Proceedings of the Hawaii International Conference on System Sciences (B. K. Kinariwala and F. F. Kuo, eds.) 413–415. Univ. Hawaii Press, Honolulu.

[10] Bayer, P, Ferreira, F. & McMillian, R. (2007). A Unified Framework for Measuring Preferences for Schools and Neighbourhoods, Journal of Political Economy, 115(4), 588-638. https://doi.org/10.1086/522381

[11] Fix, E. & Hodges, J. L. (1951). An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation, International Statistical Review, 57(3), 233-238. https://doi.org/10.2307/1403796

[12] Gahukar, G. (2018). Classification Algorithms in Machine Learning. https://medium.com/datadriveninvestor/ classification-algorithms-in-machine-learning-85c0ab65ff4

[13] Grigoryeva, I. (2017). Going Through the 'Roof': Spatial Price Diffusion and the Ripple Effect in the Vancouver Housing Market. Retrieved from https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0355265

[14] Gupta, P. (2017). Decision Trees in Machine Learning. https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052

[15] Hussain, I. (2016). Consumer Response to School Quality Information: Evidence from the Housing Market and Parents’ School Choices, University of Sussex. https://www.sole-jole.org/assets/docs/16502.pdf 

[16] Kaggle. https://www.kaggle.com/code/dansbecker/xgboost

[17] Kangane, P. et al (2021). Analysis of Different Regression Models for Real Estate Price Prediction. International Journal of Engineering Applied Sciences and Technology, 5(11), 247-254. 

[18] Kim, J. et al. (2022). A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices, Sustainability, 14(15), 9056. https://doi.org/10.3390/su14159056 

[19] Nagyfi, R. (2018). The differences between Artificial and Biological Neural Networks. https://towardsdata  science.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

[20] Ng, A. (2015). Machine Learning for a London Housing Price Prediction Mobile Application, in Electronics and Information Engineering, Imperial College London Repository. https://www.doc.ic.ac.uk/~mpd37/ theses/2015_beng_aaron-ng.pdf 

[21] Rutzen, M. (2018). Urban Tech on the Rise: Big Data Disrupts the Real Estate Industry, built Horizons. https://medium.com/built-horizons/urban-tech-on-the-rise-big-data-disrupts-the-real-estate-industry-492d9e15aba5

[22] Schmidt, J., Marques, M. R. G., Botti, S. et al. (2019). Recent advances and applications of machine learning in solid-state materials science. npj Computational Materials, 5, 83. https://doi.org/10.1038/s41524-019-0221-0 

[23] Schott, M. (2019). K-Nearest Neighbours (KNN) Algorithm for Machine Learning. https://medium.com/capital-one-tech/k-nearest-neighbors-knn-algorithm-for-machine-learning-e883219c8f26

[24] Scikit learn 1.2.1 documentation. https://scikit-learn.org/stable/modules/ensemble.html

[25] Shinde, N, & Gawande, K. (2017). Kaggle Competition: Predicting House Prices in Ames, Iowa. https://nyc datascience.com/blog/student-works/machine-learning/kaggle-ompetition-house-pricing-in-ames-iowa/

[26] Zhou, L. (2017). Machine Learning on Big Data: Opportunities and Challenges. https://who.rocq.inria.fr/ Vassilis.Christophides/Big/local_copy/intro/BigDataOpportunitiesanChallenges.pdf