image

A Machine Learning Approach to Synthetic Gini Coefficient Estimation in Colombian Municipalities

Download Paper: Download pdf
Author(s):
Abstract:

This paper presents two synthetic estimations of the Gini coefficient at a municipality level for Colombia in the years 2000-2020. The methodology relies on several machine learning models to select the best model for imputation of the data. This derives in two Random Forest models where the first is characterised by containing Dominant Fixed Effects, while the second contains a set of Dominant Varying Factors. Upon these estimations, the Synthetic Gini Coefficients for both models are inspected, and public links are generated to access them. The Dominant Fixed Effects models is rather “stiff” in contrast to the Varying Factor model. Hence, for researchers it is recommended to use the Synthetic Gini Coefficient with Varying Factors because it contains greater variability across time than the Dominant Fixed Effects models. 


© The Author(s) 2025. Published by RITHA Publishing. This article is distributed under the terms of the license CC-BY 4.0., which permits any further distribution in any medium, provided the original work is properly cited maintaining attribution to the author(s) and the title of the work, journal citation and URL DOI.


How to cite:

Riveros-Gavilanes, J. M. (2025). A machine learning approach to synthetic Gini Coefficient estimation in Colombian municipalities. Journal of Research, Innovation and Technologies, Volume IV, 1(7), 7-24. https://doi.org/10.57017/jorit.v4.1(7).01 

References:

Abdel-Rahman, H. M. & Wang, P. (1997). Social welfare and income inequality in a system of cities. Journal of Urban Economics, 41(3), 462–483. https://doi.org/10.1006/juec.1996.2013


Alwateer, M., Atlam, E.-S., Abd El-Raouf, M. M., Ghoneim, O. A., & Gad, I. (2024). Missing data imputation: A comprehensive review. Journal of Computer and Communications, 12(11), 53–75. https://doi.org/10.4236/jcc.2024.1211004 


Caravaggio, N., Resce, G., & Vaquero-Piñeiro, C. (2025). Predicting policy funding allocation with Machine Learning. Socio-Economic Planning Sciences, 98, 102175. https://doi.org/10.1016/j.seps.2025.102175


Castelló-Climent, A. & Doménech, R. (2021). Human capital and income inequality revisited. Education Economics, 29(2), 194–212. https://doi.org/10.1080/09645292.2020.1870936 


CEDE (2023). Panel municipal Centro de Estudios sobre el Desarrollo Económico. https://datoscede.uniandes.edu.co/catalogo-de-datos/


Clark, C. M. & Kavanagh, C. (1996). Basic income, inequality, and unemployment: rethinking the linkage between work and welfare. Journal of Economic Issues, 30(2), 399– 406. https://doi.org/10.1080/00213624.1996.11505803 


Coady, D., D’Angelo, D., & Evans, B. (2022). Fiscal redistribution, social welfare and income inequality: ‘doing more’ or ‘more to do’? Applied Economics, 54(21), 2416–2429. https://doi.org/10.1080/00036846.2021.1990840 


Coburn, D. (2015). Income inequality, welfare, class and health: A comment on Pickett and Wilkinson. Social Science & Medicine, 146, 228–232. https://doi.org/10.1016/j.socscimed.2015.09.002 


Combes, P.-P., Gobillon, L., & Zylberberg, Y. (2022). Urban economics in a historical perspective: Recovering data with machine learning. Regional Science and Urban Economics, 94, 103711. https://doi.org/10.1016/j.regsciurbeco.2021.103711


Dagum, C. (1990). On the relationship between income inequality measures and social welfare functions. Journal of Econometrics, 43(1-2), 91–102. https://doi.org/10.1016/0304-4076(90)90109-7


Gao, Q.-L., Zhong, C., Yue, Y., Cao, R., & Zhang, B. (2024). Income estimation based on human mobility patterns and machine learning models. Applied Geography, 163, 103179. https://doi.org/10.1016/j.apgeog.2023.103179


Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://doi.org/10.1017/CBO9780511790942


Gond, V. K., Dubey, A., & Rasool, A. (2021). A survey of machine learning-based approaches for missing value imputation. In 2021 3rd International Conference on Inventive Research in Computing Applications (ICIRCA), 1–8. IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544957 


Hong, S. & Lynn, H. S. (2020). Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Medical Research Methodology, 20:1–12. https://doi.org/10.1186/s12874-020-01080-1 


Kim, K.-t. (2017). The relationships between income inequality, welfare regimes and aggregate health: A systematic review. The European Journal of Public Health, 27(3), 397–404. https://doi.org/10.1093/eurpub/ckx055


Kuhn, M. (2008). Building predictive models in r using the caret package. Journal of Statistical Software, 28, 1–26. https://doi.org/10.18637/jss.v028.i05 


Kühn, M. (2015). Peripheralization: Theoretical concepts explaining socio-spatial inequalities. European Planning Studies, 23(2), 367–378. https://doi.org/10.1080/09654313.2013.862518 


Lakshminarayan, K., Harp, S. A., Goldman, R. P., Samad, T., et al. (1996). Imputation of missing data using machine learning techniques. In KDD, Volume 96. https://cdn.aaai.org/KDD/1996/KDD96-023.pdf 


Lee, J.-W. & Lee, H. (2018). Human capital and income inequality. Journal of the Asia Pacific Economy, 23(4), 554–583. https://doi.org/10.1080/13547860.2018.1515002 
 


Lee, K.-K. & Vu, T. V. (2020). Economic complexity, human capital and income inequality: A cross-country analysis. The Japanese Economic Review, 71(4), 695–718. https://doi.org/10.1007/s42973-019-00026-7 


Lin, W.-C. & Tsai, C.-F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53, 1487–1509. https://doi.org/10.1007/s10462-019-09709-4 


Lin, W.-C., Tsai, C.-F., & Zhong, J. R. (2022). Deep learning for missing value imputation of continuous data and the effect of data discretization. Knowledge-Based Systems, 239, 108079. https://doi.org/10.1016/j.knosys.2021.108079 


Ma, X., Hao, Y., Li, X., Liu, J., & Qi, J. (2023). Evaluating global intelligence innovation: An index based on machine learning methods. Technological Forecasting and Social Change, 194, 122736. https://doi.org/10.1016/j.techfore.2023.122736


Oppido, S., Ragozino, S., & Esposito De Vita, G. (2023). Peripheral, marginal, or noncore areas? setting the context to deal with territorial inequalities through a systematic literature review. Sustainability, 15(13), 10401. https://doi.org/10.3390/su151310401 


Paas, T. & Schlitte, F. (2008). Regional income inequality and convergence processes in the EU-25. Scienze regionali: Italian Journal of Regional Science: 7, Supplemento 2, 2008, 29-49. https://www.francoangeli.it/riviste/articolo/33743 


Rácz, A., & Gere, A. (2025). Comparison of missing value imputation tools for machine learning models based on product development cases studies. LWT, 117585. https://doi.org/10.1016/j.lwt.2025.117585


Rey, S. J. (2004). Spatial analysis of regional income inequality. Spatially Integrated Social Science, 1, 280–299. https://doi.org/10.1093/oso/9780195152708.003.0014


Ridgeway, G. (2007). Generalized Boosted Models: A guide to the GBM package. Update, 1(1). https://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf 


Riveros-Gavilanes, J. M. (2021). Estimation of Amartya Sen's social welfare function for Latin America. Ensayos de Economía, 31(59), 13-40. https://doi.org/10.15446/ede.v31n59.88235 


Riveros-Gavilanes, J. M. (2023). On the empirics of violence, inequality, and income. Journal of Economics and Management, 45(1), 102–136. https://doi.org/10.22367/jem.2023.45.06


Riveros-Gavilanes, J. M., Al Akayleh, F., Oduniyi, O., Samuel, A. H., & Hassan, S. M. (2022). On the welfare trends: A view from the Sen’s social welfare function. Technical Report, M & S Research Hub institute. https://ideas.repec.org/p/ris/msrwps/2022_003.html 


Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.  ISBN 0-471-08705-X, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316696.fmatter 


Salvati, L. (2016). The dark side of the crisis: disparities in per capita income (2000–12) and the urban-rural gradient in Greece. Tijdschrift voor economische en sociale geografie, 107(5), 628–641. https://doi.org/10.1111/tesg.12203 


Schafer, J. L. and Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://pubmed.ncbi.nlm.nih.gov/12090408/ 


Seu, K., Kang, M.-S., and Lee, H. (2022). An intelligent missing data imputation technique: A review. International Journal on Informatics Visualization, 6(1-2), 278– 283. http://dx.doi.org/10.30630/joiv.6.1-2.935 


Silva, T. C., Wilhelm, P. V. B., & Amancio, D. R. (2024). Machine learning and economic forecasting: The role of international trade networks. Physica A: Statistical Mechanics and Its Applications, 649, 129977. https://doi.org/10.1016/j.physa.2024.129977


Sologon, D. M., Doorley, K., & O’Donoghue, C. (2023). Drivers of income inequality: what can we learn using microsimulation? Handbook of Labor, Human Resources and Population Economics, 1–37. https://doi.org/10.1007/978-3-319-57365-6_392-1 


Sullivan, T. R., Lee, K. J., Ryan, P., & Salter, A. B. (2017). Multiple imputation for handling missing outcome data when estimating the relative risk. BMC Medical Research Methodology, 17, 1-10. https://doi.org/10.1186/s12874-017-0414-5 
 

Sun, Y., Li, J., Xu, Y., Zhang, T., & Wang, X. (2023). Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Systems with Applications, 227, 120201. https://doi.org/10.1016/j.eswa.2023.120201 


Teng, W., Mamman, S. O., Xiao, C., & Abbas, S. (2024). Impact of natural resources on income equality in Gulf Cooperation Council: Evidence from machine learning approach. Resources Policy, 88, 104427. https://doi.org/10.1016/j.resourpol.2023.104427


Therneau, T., Atkinson, B, & Ripley, B. (2015). Package ‘rpart’. https://cran.r-project.org/web/packages/rpart/rpart.pdf


Wang, S., Li, B., Yang, M., & Yan, Z. (2019). Missing data imputation for machine learning. In IoT as a Service: 4th EAI International Conference, IoTaaS 2018, Xian, China, November 17–18, 2018, Proceedings 4, 67–72. Springer. http://dx.doi.org/10.1007/978-3-030-14657-3_7 


Wickham, H. (2011). ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics, 3(2), 180–185. http://dx.doi.org/10.1002/wics.147 


Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., et al. (2019). Welcome to the tidy verse. Journal of Open-Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686 


Wildowicz-Szumarska, A. (2022). Is redistributive policy of EU welfare state effective in tackling income inequality? A panel data analysis. Equilibrium. Quarterly Journal of Economics and Economic Policy, 17(1), 81–101. https://doi.org/10.24136/eq.2022.004 


Xue, J. (2023). Review on data imputation methods in machine learning. Journal of Physics: Conference Series, Volume 2646, 012034. IOP Publishing. https://doi.org/10.1088/1742-6596/2646/1/012034 


Yang, X. & Tang, W. (2023). Additional social welfare of environmental regulation: The effect of environmental taxes on income inequality. Journal of Environmental Management, 330, 117095. https://doi.org/10.1016/j.jenvman.2022.117095 


Yarberry, W. (2021). CRAN Recipes: DPLYR, Stringr, Lubridate, and RegEx in R, pages 1–58. Apress Berkeley, CA, ISBN: 978-1-4842-6875-9, eBook ISBN: 978-1-4842-6876-6. https://doi.org/10.1007/978-1-4842-6876-6 


Zhan, C., Liu, Y., Wu, Z., Zhao, M., & Chow, T. W. S. (2023). A hybrid machine learning framework for forecasting house price. Expert Systems with Applications, 233, 120981. https://doi.org/10.1016/j.eswa.2023.120981


Zhu, J., & Huang, T. (2024). Public debt and welfare with machine learning. Finance Research Letters, 69, 106164. https://doi.org/10.1016/j.frl.2024.106164