A Machine Learning Approach to Synthetic Gini Coefficient Estimation in Colombian Municipalities

Article DOI: https://doi.org/10.57017/jorit.v4.1(7).01

Author(s):

John Michael RIVEROS-GAVILANES Faculty of Administration and Economics, University College of Cundinamarca, Colombia

Abstract:

This paper presents two synthetic estimations of the Gini coefficient at a municipality level for Colombia in the years 2000-2020. The methodology relies on several machine learning models to select the best model for imputation of the data. This derives in two Random Forest models where the first is characterised by containing Dominant Fixed Effects, while the second contains a set of Dominant Varying Factors. Upon these estimations, the Synthetic Gini Coefficients for both models are inspected, and public links are generated to access them. The Dominant Fixed Effects models is rather “stiff” in contrast to the Varying Factor model. Hence, for researchers it is recommended to use the Synthetic Gini Coefficient with Varying Factors because it contains greater variability across time than the Dominant Fixed Effects models.

© The Author(s) 2025. Published by RITHA Publishing. This article is distributed under the terms of the license CC-BY 4.0., which permits any further distribution in any medium, provided the original work is properly cited maintaining attribution to the author(s) and the title of the work, journal citation and URL DOI.

How to cite:

Riveros-Gavilanes, J. M. (2025). A machine learning approach to synthetic Gini Coefficient estimation in Colombian municipalities. Journal of Research, Innovation and Technologies, Volume IV, 1(7), 7-24. https://doi.org/10.57017/jorit.v4.1(7).01

References:

Abdel-Rahman, H. M. & Wang, P. (1997). Social welfare and income inequality in a system of cities. Journal of Urban Economics, 41(3), 462–483. https://doi.org/10.1006/juec.1996.2013

Alwateer, M., Atlam, E.-S., Abd El-Raouf, M. M., Ghoneim, O. A., & Gad, I. (2024). Missing data imputation: A comprehensive review. Journal of Computer and Communications, 12(11), 53–75. https://doi.org/10.4236/jcc.2024.1211004

Caravaggio, N., Resce, G., & Vaquero-Piñeiro, C. (2025). Predicting policy funding allocation with Machine Learning. Socio-Economic Planning Sciences, 98, 102175. https://doi.org/10.1016/j.seps.2025.102175

Castelló-Climent, A. & Doménech, R. (2021). Human capital and income inequality revisited. Education Economics, 29(2), 194–212. https://doi.org/10.1080/09645292.2020.1870936

CEDE (2023). Panel municipal Centro de Estudios sobre el Desarrollo Económico. https://datoscede.uniandes.edu.co/catalogo-de-datos/

Clark, C. M. & Kavanagh, C. (1996). Basic income, inequality, and unemployment: rethinking the linkage between work and welfare. Journal of Economic Issues, 30(2), 399– 406. https://doi.org/10.1080/00213624.1996.11505803

Coady, D., D’Angelo, D., & Evans, B. (2022). Fiscal redistribution, social welfare and income inequality: ‘doing more’ or ‘more to do’? Applied Economics, 54(21), 2416–2429. https://doi.org/10.1080/00036846.2021.1990840

Coburn, D. (2015). Income inequality, welfare, class and health: A comment on Pickett and Wilkinson. Social Science & Medicine, 146, 228–232. https://doi.org/10.1016/j.socscimed.2015.09.002

Combes, P.-P., Gobillon, L., & Zylberberg, Y. (2022). Urban economics in a historical perspective: Recovering data with machine learning. Regional Science and Urban Economics, 94, 103711. https://doi.org/10.1016/j.regsciurbeco.2021.103711

Dagum, C. (1990). On the relationship between income inequality measures and social welfare functions. Journal of Econometrics, 43(1-2), 91–102. https://doi.org/10.1016/0304-4076(90)90109-7

Gao, Q.-L., Zhong, C., Yue, Y., Cao, R., & Zhang, B. (2024). Income estimation based on human mobility patterns and machine learning models. Applied Geography, 163, 103179. https://doi.org/10.1016/j.apgeog.2023.103179

Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://doi.org/10.1017/CBO9780511790942

Gond, V. K., Dubey, A., & Rasool, A. (2021). A survey of machine learning-based approaches for missing value imputation. In 2021 3rd International Conference on Inventive Research in Computing Applications (ICIRCA), 1–8. IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544957

Hong, S. & Lynn, H. S. (2020). Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Medical Research Methodology, 20:1–12. https://doi.org/10.1186/s12874-020-01080-1

Kim, K.-t. (2017). The relationships between income inequality, welfare regimes and aggregate health: A systematic review. The European Journal of Public Health, 27(3), 397–404. https://doi.org/10.1093/eurpub/ckx055

Kuhn, M. (2008). Building predictive models in r using the caret package. Journal of Statistical Software, 28, 1–26. https://doi.org/10.18637/jss.v028.i05

Kühn, M. (2015). Peripheralization: Theoretical concepts explaining socio-spatial inequalities. European Planning Studies, 23(2), 367–378. https://doi.org/10.1080/09654313.2013.862518

Lakshminarayan, K., Harp, S. A., Goldman, R. P., Samad, T., et al. (1996). Imputation of missing data using machine learning techniques. In KDD, Volume 96. https://cdn.aaai.org/KDD/1996/KDD96-023.pdf

Lee, J.-W. & Lee, H. (2018). Human capital and income inequality. Journal of the Asia Pacific Economy, 23(4), 554–583. https://doi.org/10.1080/13547860.2018.1515002

Lee, K.-K. & Vu, T. V. (2020). Economic complexity, human capital and income inequality: A cross-country analysis. The Japanese Economic Review, 71(4), 695–718. https://doi.org/10.1007/s42973-019-00026-7

Lin, W.-C. & Tsai, C.-F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53, 1487–1509. https://doi.org/10.1007/s10462-019-09709-4

Lin, W.-C., Tsai, C.-F., & Zhong, J. R. (2022). Deep learning for missing value imputation of continuous data and the effect of data discretization. Knowledge-Based Systems, 239, 108079. https://doi.org/10.1016/j.knosys.2021.108079

Ma, X., Hao, Y., Li, X., Liu, J., & Qi, J. (2023). Evaluating global intelligence innovation: An index based on machine learning methods. Technological Forecasting and Social Change, 194, 122736. https://doi.org/10.1016/j.techfore.2023.122736

Oppido, S., Ragozino, S., & Esposito De Vita, G. (2023). Peripheral, marginal, or noncore areas? setting the context to deal with territorial inequalities through a systematic literature review. Sustainability, 15(13), 10401. https://doi.org/10.3390/su151310401

Paas, T. & Schlitte, F. (2008). Regional income inequality and convergence processes in the EU-25. Scienze regionali: Italian Journal of Regional Science: 7, Supplemento 2, 2008, 29-49. https://www.francoangeli.it/riviste/articolo/33743

Rácz, A., & Gere, A. (2025). Comparison of missing value imputation tools for machine learning models based on product development cases studies. LWT, 117585. https://doi.org/10.1016/j.lwt.2025.117585

Rey, S. J. (2004). Spatial analysis of regional income inequality. Spatially Integrated Social Science, 1, 280–299. https://doi.org/10.1093/oso/9780195152708.003.0014

Ridgeway, G. (2007). Generalized Boosted Models: A guide to the GBM package. Update, 1(1). https://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf

Riveros-Gavilanes, J. M. (2021). Estimation of Amartya Sen's social welfare function for Latin America. Ensayos de Economía, 31(59), 13-40. https://doi.org/10.15446/ede.v31n59.88235

Riveros-Gavilanes, J. M. (2023). On the empirics of violence, inequality, and income. Journal of Economics and Management, 45(1), 102–136. https://doi.org/10.22367/jem.2023.45.06

Riveros-Gavilanes, J. M., Al Akayleh, F., Oduniyi, O., Samuel, A. H., & Hassan, S. M. (2022). On the welfare trends: A view from the Sen’s social welfare function. Technical Report, M & S Research Hub institute. https://ideas.repec.org/p/ris/msrwps/2022_003.html

Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons. ISBN 0-471-08705-X, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316696.fmatter

Salvati, L. (2016). The dark side of the crisis: disparities in per capita income (2000–12) and the urban-rural gradient in Greece. Tijdschrift voor economische en sociale geografie, 107(5), 628–641. https://doi.org/10.1111/tesg.12203

Schafer, J. L. and Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://pubmed.ncbi.nlm.nih.gov/12090408/

Seu, K., Kang, M.-S., and Lee, H. (2022). An intelligent missing data imputation technique: A review. International Journal on Informatics Visualization, 6(1-2), 278– 283. http://dx.doi.org/10.30630/joiv.6.1-2.935

Silva, T. C., Wilhelm, P. V. B., & Amancio, D. R. (2024). Machine learning and economic forecasting: The role of international trade networks. Physica A: Statistical Mechanics and Its Applications, 649, 129977. https://doi.org/10.1016/j.physa.2024.129977

Sologon, D. M., Doorley, K., & O’Donoghue, C. (2023). Drivers of income inequality: what can we learn using microsimulation? Handbook of Labor, Human Resources and Population Economics, 1–37. https://doi.org/10.1007/978-3-319-57365-6_392-1

Sullivan, T. R., Lee, K. J., Ryan, P., & Salter, A. B. (2017). Multiple imputation for handling missing outcome data when estimating the relative risk. BMC Medical Research Methodology, 17, 1-10. https://doi.org/10.1186/s12874-017-0414-5

Sun, Y., Li, J., Xu, Y., Zhang, T., & Wang, X. (2023). Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Systems with Applications, 227, 120201. https://doi.org/10.1016/j.eswa.2023.120201

Teng, W., Mamman, S. O., Xiao, C., & Abbas, S. (2024). Impact of natural resources on income equality in Gulf Cooperation Council: Evidence from machine learning approach. Resources Policy, 88, 104427. https://doi.org/10.1016/j.resourpol.2023.104427

Therneau, T., Atkinson, B, & Ripley, B. (2015). Package ‘rpart’. https://cran.r-project.org/web/packages/rpart/rpart.pdf

Wang, S., Li, B., Yang, M., & Yan, Z. (2019). Missing data imputation for machine learning. In IoT as a Service: 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17–18, 2018, Proceedings 4, 67–72. Springer. http://dx.doi.org/10.1007/978-3-030-14657-3_7

Wickham, H. (2011). ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics, 3(2), 180–185. http://dx.doi.org/10.1002/wics.147

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., et al. (2019). Welcome to the tidy verse. Journal of Open-Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wildowicz-Szumarska, A. (2022). Is redistributive policy of EU welfare state effective in tackling income inequality? A panel data analysis. Equilibrium. Quarterly Journal of Economics and Economic Policy, 17(1), 81–101. https://doi.org/10.24136/eq.2022.004

Xue, J. (2023). Review on data imputation methods in machine learning. Journal of Physics: Conference Series, Volume 2646, 012034. IOP Publishing. https://doi.org/10.1088/1742-6596/2646/1/012034

Yang, X. & Tang, W. (2023). Additional social welfare of environmental regulation: The effect of environmental taxes on income inequality. Journal of Environmental Management, 330, 117095. https://doi.org/10.1016/j.jenvman.2022.117095

Yarberry, W. (2021). CRAN Recipes: DPLYR, Stringr, Lubridate, and RegEx in R, pages 1–58. Apress Berkeley, CA, ISBN: 978-1-4842-6875-9, eBook ISBN: 978-1-4842-6876-6. https://doi.org/10.1007/978-1-4842-6876-6

Zhan, C., Liu, Y., Wu, Z., Zhao, M., & Chow, T. W. S. (2023). A hybrid machine learning framework for forecasting house price. Expert Systems with Applications, 233, 120981. https://doi.org/10.1016/j.eswa.2023.120981

Zhu, J., & Huang, T. (2024). Public debt and welfare with machine learning. Finance Research Letters, 69, 106164. https://doi.org/10.1016/j.frl.2024.106164