A Machine Learning Approach to Synthetic Gini Coefficient Estimation in Colombian Municipalities
-
John Michael RIVEROS-GAVILANES Faculty of Administration and Economics, University College of Cundinamarca, Colombia
This paper presents two synthetic estimations of the Gini coefficient at a municipality level for Colombia in the years 2000-2020. The methodology relies on several machine learning models to select the best model for imputation of the data. This derives in two Random Forest models where the first is characterised by containing Dominant Fixed Effects, while the second contains a set of Dominant Varying Factors. Upon these estimations, the Synthetic Gini Coefficients for both models are inspected, and public links are generated to access them. The Dominant Fixed Effects models is rather “stiff” in contrast to the Varying Factor model. Hence, for researchers it is recommended to use the Synthetic Gini Coefficient with Varying Factors because it contains greater variability across time than the Dominant Fixed Effects models.
© The Author(s) 2025. Published by RITHA Publishing. This article is distributed under the terms of the license CC-BY 4.0., which permits any further distribution in any medium, provided the original work is properly cited maintaining attribution to the author(s) and the title of the work, journal citation and URL DOI.
Riveros-Gavilanes, J. M. (2025). A machine learning approach to synthetic Gini Coefficient estimation in Colombian municipalities. Journal of Research, Innovation and Technologies, Volume IV, 1(7), 7-24. https://doi.org/10.57017/jorit.v4.1(7).01
Abdel-Rahman, H. M. & Wang, P. (1997). Social welfare and income inequality in a system of cities. Journal of Urban Economics, 41(3), 462–483. https://doi.org/10.1006/juec.1996.2013
Alwateer, M., Atlam, E.-S., Abd El-Raouf, M. M., Ghoneim, O. A., & Gad, I. (2024). Missing data imputation: A comprehensive review. Journal of Computer and Communications, 12(11), 53–75. https://doi.org/10.4236/jcc.2024.1211004
Caravaggio, N., Resce, G., & Vaquero-Piñeiro, C. (2025). Predicting policy funding allocation with Machine Learning. Socio-Economic Planning Sciences, 98, 102175. https://doi.org/10.1016/j.seps.2025.102175
Castelló-Climent, A. & Doménech, R. (2021). Human capital and income inequality revisited. Education Economics, 29(2), 194–212. https://doi.org/10.1080/09645292.2020.1870936
CEDE (2023). Panel municipal Centro de Estudios sobre el Desarrollo Económico. https://datoscede.uniandes.edu.co/catalogo-de-datos/
Clark, C. M. & Kavanagh, C. (1996). Basic income, inequality, and unemployment: rethinking the linkage between work and welfare. Journal of Economic Issues, 30(2), 399– 406. https://doi.org/10.1080/00213624.1996.11505803
Coady, D., D’Angelo, D., & Evans, B. (2022). Fiscal redistribution, social welfare and income inequality: ‘doing more’ or ‘more to do’? Applied Economics, 54(21), 2416–2429. https://doi.org/10.1080/00036846.2021.1990840
Coburn, D. (2015). Income inequality, welfare, class and health: A comment on Pickett and Wilkinson. Social Science & Medicine, 146, 228–232. https://doi.org/10.1016/j.socscimed.2015.09.002
Combes, P.-P., Gobillon, L., & Zylberberg, Y. (2022). Urban economics in a historical perspective: Recovering data with machine learning. Regional Science and Urban Economics, 94, 103711. https://doi.org/10.1016/j.regsciurbeco.2021.103711
Dagum, C. (1990). On the relationship between income inequality measures and social welfare functions. Journal of Econometrics, 43(1-2), 91–102. https://doi.org/10.1016/0304-4076(90)90109-7
Gao, Q.-L., Zhong, C., Yue, Y., Cao, R., & Zhang, B. (2024). Income estimation based on human mobility patterns and machine learning models. Applied Geography, 163, 103179. https://doi.org/10.1016/j.apgeog.2023.103179
Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://doi.org/10.1017/CBO9780511790942
Gond, V. K., Dubey, A., & Rasool, A. (2021). A survey of machine learning-based approaches for missing value imputation. In 2021 3rd International Conference on Inventive Research in Computing Applications (ICIRCA), 1–8. IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544957
Hong, S. & Lynn, H. S. (2020). Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Medical Research Methodology, 20:1–12. https://doi.org/10.1186/s12874-020-01080-1
Kim, K.-t. (2017). The relationships between income inequality, welfare regimes and aggregate health: A systematic review. The European Journal of Public Health, 27(3), 397–404. https://doi.org/10.1093/eurpub/ckx055
Kuhn, M. (2008). Building predictive models in r using the caret package. Journal of Statistical Software, 28, 1–26. https://doi.org/10.18637/jss.v028.i05
Kühn, M. (2015). Peripheralization: Theoretical concepts explaining socio-spatial inequalities. European Planning Studies, 23(2), 367–378. https://doi.org/10.1080/09654313.2013.862518
Lakshminarayan, K., Harp, S. A., Goldman, R. P., Samad, T., et al. (1996). Imputation of missing data using machine learning techniques. In KDD, Volume 96. https://cdn.aaai.org/KDD/1996/KDD96-023.pdf
Lee, J.-W. & Lee, H. (2018). Human capital and income inequality. Journal of the Asia Pacific Economy, 23(4), 554–583. https://doi.org/10.1080/13547860.2018.1515002
Lee, K.-K. & Vu, T. V. (2020). Economic complexity, human capital and income inequality: A cross-country analysis. The Japanese Economic Review, 71(4), 695–718. https://doi.org/10.1007/s42973-019-00026-7
Lin, W.-C. & Tsai, C.-F. (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53, 1487–1509. https://doi.org/10.1007/s10462-019-09709-4
Lin, W.-C., Tsai, C.-F., & Zhong, J. R. (2022). Deep learning for missing value imputation of continuous data and the effect of data discretization. Knowledge-Based Systems, 239, 108079. https://doi.org/10.1016/j.knosys.2021.108079
Ma, X., Hao, Y., Li, X., Liu, J., & Qi, J. (2023). Evaluating global intelligence innovation: An index based on machine learning methods. Technological Forecasting and Social Change, 194, 122736. https://doi.org/10.1016/j.techfore.2023.122736
Oppido, S., Ragozino, S., & Esposito De Vita, G. (2023). Peripheral, marginal, or noncore areas? setting the context to deal with territorial inequalities through a systematic literature review. Sustainability, 15(13), 10401. https://doi.org/10.3390/su151310401
Paas, T. & Schlitte, F. (2008). Regional income inequality and convergence processes in the EU-25. Scienze regionali: Italian Journal of Regional Science: 7, Supplemento 2, 2008, 29-49. https://www.francoangeli.it/riviste/articolo/33743
Rácz, A., & Gere, A. (2025). Comparison of missing value imputation tools for machine learning models based on product development cases studies. LWT, 117585. https://doi.org/10.1016/j.lwt.2025.117585
Rey, S. J. (2004). Spatial analysis of regional income inequality. Spatially Integrated Social Science, 1, 280–299. https://doi.org/10.1093/oso/9780195152708.003.0014
Ridgeway, G. (2007). Generalized Boosted Models: A guide to the GBM package. Update, 1(1). https://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf
Riveros-Gavilanes, J. M. (2021). Estimation of Amartya Sen's social welfare function for Latin America. Ensayos de Economía, 31(59), 13-40. https://doi.org/10.15446/ede.v31n59.88235
Riveros-Gavilanes, J. M. (2023). On the empirics of violence, inequality, and income. Journal of Economics and Management, 45(1), 102–136. https://doi.org/10.22367/jem.2023.45.06
Riveros-Gavilanes, J. M., Al Akayleh, F., Oduniyi, O., Samuel, A. H., & Hassan, S. M. (2022). On the welfare trends: A view from the Sen’s social welfare function. Technical Report, M & S Research Hub institute. https://ideas.repec.org/p/ris/msrwps/2022_003.html
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons. ISBN 0-471-08705-X, https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470316696.fmatter
Salvati, L. (2016). The dark side of the crisis: disparities in per capita income (2000–12) and the urban-rural gradient in Greece. Tijdschrift voor economische en sociale geografie, 107(5), 628–641. https://doi.org/10.1111/tesg.12203
Schafer, J. L. and Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://pubmed.ncbi.nlm.nih.gov/12090408/
Seu, K., Kang, M.-S., and Lee, H. (2022). An intelligent missing data imputation technique: A review. International Journal on Informatics Visualization, 6(1-2), 278– 283. http://dx.doi.org/10.30630/joiv.6.1-2.935
Silva, T. C., Wilhelm, P. V. B., & Amancio, D. R. (2024). Machine learning and economic forecasting: The role of international trade networks. Physica A: Statistical Mechanics and Its Applications, 649, 129977. https://doi.org/10.1016/j.physa.2024.129977
Sologon, D. M., Doorley, K., & O’Donoghue, C. (2023). Drivers of income inequality: what can we learn using microsimulation? Handbook of Labor, Human Resources and Population Economics, 1–37. https://doi.org/10.1007/978-3-319-57365-6_392-1
Sullivan, T. R., Lee, K. J., Ryan, P., & Salter, A. B. (2017). Multiple imputation for handling missing outcome data when estimating the relative risk. BMC Medical Research Methodology, 17, 1-10. https://doi.org/10.1186/s12874-017-0414-5
Sun, Y., Li, J., Xu, Y., Zhang, T., & Wang, X. (2023). Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Systems with Applications, 227, 120201. https://doi.org/10.1016/j.eswa.2023.120201
Teng, W., Mamman, S. O., Xiao, C., & Abbas, S. (2024). Impact of natural resources on income equality in Gulf Cooperation Council: Evidence from machine learning approach. Resources Policy, 88, 104427. https://doi.org/10.1016/j.resourpol.2023.104427
Therneau, T., Atkinson, B, & Ripley, B. (2015). Package ‘rpart’. https://cran.r-project.org/web/packages/rpart/rpart.pdf
Wang, S., Li, B., Yang, M., & Yan, Z. (2019). Missing data imputation for machine learning. In IoT as a Service: 4th EAI International Conference, IoTaaS 2018, Xi’an, China, November 17–18, 2018, Proceedings 4, 67–72. Springer. http://dx.doi.org/10.1007/978-3-030-14657-3_7
Wickham, H. (2011). ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics, 3(2), 180–185. http://dx.doi.org/10.1002/wics.147
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., et al. (2019). Welcome to the tidy verse. Journal of Open-Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wildowicz-Szumarska, A. (2022). Is redistributive policy of EU welfare state effective in tackling income inequality? A panel data analysis. Equilibrium. Quarterly Journal of Economics and Economic Policy, 17(1), 81–101. https://doi.org/10.24136/eq.2022.004
Xue, J. (2023). Review on data imputation methods in machine learning. Journal of Physics: Conference Series, Volume 2646, 012034. IOP Publishing. https://doi.org/10.1088/1742-6596/2646/1/012034
Yang, X. & Tang, W. (2023). Additional social welfare of environmental regulation: The effect of environmental taxes on income inequality. Journal of Environmental Management, 330, 117095. https://doi.org/10.1016/j.jenvman.2022.117095
Yarberry, W. (2021). CRAN Recipes: DPLYR, Stringr, Lubridate, and RegEx in R, pages 1–58. Apress Berkeley, CA, ISBN: 978-1-4842-6875-9, eBook ISBN: 978-1-4842-6876-6. https://doi.org/10.1007/978-1-4842-6876-6
Zhan, C., Liu, Y., Wu, Z., Zhao, M., & Chow, T. W. S. (2023). A hybrid machine learning framework for forecasting house price. Expert Systems with Applications, 233, 120981. https://doi.org/10.1016/j.eswa.2023.120981
Zhu, J., & Huang, T. (2024). Public debt and welfare with machine learning. Finance Research Letters, 69, 106164. https://doi.org/10.1016/j.frl.2024.106164