Explainable Machine Learning for Poverty Prediction in Central Java Regencies and Cities

Authors

  • Wahyu Fhaldian Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang, Indonesia
  • Amiq Fahmi Fakultas Ilmu Komputer, Universitas Dian Nuswantoro, Semarang, Indonesia

DOI:

10.33395/sinkron.v9i4.15312

Keywords:

Poverty, XGBoost, Random Forest, SHAP, Central Java

Abstract

Poverty remains a multidimensional challenge in Central Java, necessitating robust data-driven approaches to identify its socioeconomic determinants. This study applied six machine learning models, specifically Extreme Gradient Boosting (XGBoost), Random Forest, CatBoost, LightGBM, Elastic Net Regression, and a Stacking ensemble using district-level data from Statistics Indonesia covering demographics, education, labor, infrastructure, and household welfare. Model evaluation combined an 80:20 hold-out split, 10-fold cross-validation, and noise perturbation tests. Results show that XGBoost achieved the best individual performance (MAE = 2,180.01; RMSE = 3,512.07; R² = 0.931), while the Stacking ensemble surpassed all single learners (MAE = 2,640.99; RMSE = 3,202.79; R² = 0.942). Interpretability was ensured through SHAP (Shapley Additive Explanations), Partial Dependence Plots (PDP), and Accumulated Local Effects (ALE), consistently identifying Number of Households, Per Capita Expenditure, and Uninhabitable Houses as the most influential predictors. Counterfactual simulations indicated that increasing per capita expenditure by 10% could reduce the poverty index by 9.9%, while reducing household size by 10% lowered it by 11.3%. Robustness checks revealed Brebes as an influential district shaping model stability. Overall, the findings demonstrate that boosting and stacking ensembles, when combined with explainable AI tools, not only enhance predictive accuracy but also provide transparent, policy-relevant evidence to strengthen poverty alleviation programs in Central Java. This study contributes both methodological advances in explainable machine learning and practical insights for targeted poverty reduction strategies.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 82(4), 1059–1086. https://doi.org/10.1111/rssb.12377

Badan Pusat Statistik BPS. (2024). Kemiskinan 2024 Jawa Tengah. In Badan Pusat Statistik Provinsi Jawa Tengah. https://jateng.bps.go.id/id/statistics-table/2/MzQjMg==/kemiskinan.html

Binka, M., Klaver, B., Cua, G., Wong, A. W., Fibke, C., Velásquez García, H. A., Adu, P., Levin, A., Mishra, S., Sander, B., Sbihi, H., & Janjua, N. Z. (2022). An Elastic Net Regression Model for Identifying Long COVID Patients Using Health Administrative Data: A Population-Based Study. Open Forum Infectious Diseases, 9(12). https://doi.org/10.1093/ofid/ofac640

Chi, G., Fang, H., Chatterjee, S., & Blumenstock, J. E. (2022). Microestimates of wealth for all low- and middle-income countries. Proceedings of the National Academy of Sciences of the United States of America, 119(3), 1–11. https://doi.org/10.1073/pnas.2113658119

Christoph Molnar. (2025). Interpretable Machine Learning A Guide for Making Black Box Models Explainable (3rd ed.).

Corral Rodas, P., Henderson, H., & Segovia, S. (2023). Poverty mapping in the age of machine learning. Available at SSRN 4587156, May.

Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. 1–7.

Fobi, S., Cardona, M., Collins, E., Robinson, C., Ortiz, A., Sederholm, T., Dodhia, R., & Ferres, J. L. (2023). Poverty rate prediction using multi-modal survey and earth observation data. COMPASS 2023 - Proceedings of the ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies, 23–29. https://doi.org/10.1145/3588001.3609359

Greenwell, B. M., Boehmke, B. C., & McCarthy, A. J. (2018). A Simple and Effective Model-Based Variable Importance Measure. http://arxiv.org/abs/1805.04755

Hall, O., Ohlsson, M., & Rögnvaldsson, T. (2022). A review of explainable AI in the satellite data, deep machine learning, and human poverty domain. Patterns, 3(10). https://doi.org/10.1016/j.patter.2022.100600

Huang, W., Liu, Y., Hu, P., Ding, S., Gao, S., & Zhang, M. (2023). Heliyon What influence farmers ’ relative poverty in China : A global analysis based on statistical and interpretable machine learning methods. Heliyon, 9(9), e19525. https://doi.org/10.1016/j.heliyon.2023.e19525

Izzati, F., Masjkur, M., & Afendi, F. M. (2024). Comparison of Chi-Square Automatic Interaction Detector (CHAID) and Random Forest Methods in the Classification of Household Poverty Status in Central Java. Indonesian Journal of Statistics and Its Applications, 8(1), 1–13. https://doi.org/10.29244/ijsa.v8i1p1-13

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.

Kuswanto, H., Rouhan, A. A., Qori, M., Hia, S., & Fithriasari, K. (2025). Per capita expenditure prediction using model stacking based on satellite imagery. 14(2), 1220–1231. https://doi.org/10.11591/ijai.v14.i2.pp1220-1231

Lastras Rodríguez, C. A. (2024). Predicting social welfare in Madrid neighbourhoods using machine learning. Regional Studies, Regional Science, 11(1), 496–522. https://doi.org/10.1080/21681376.2024.2380890

Lee, E., Ong, T. S., & Lee, Y. (2024). Evaluating Household Consumption Patterns: Comparative Analysis Using Ordinary Least Squares and Random Forest Regression Models. HighTech and Innovation Journal, 5(2), 489–507. https://doi.org/10.28991/HIJ-2024-05-02-019

Li, Q., Yu, S., Échevin, D., & Fan, M. (2022). Is poverty predictable with machine learning? A study of DHS data from Kyrgyzstan. Socio-Economic Planning Sciences, 81, 101195. https://doi.org/https://doi.org/10.1016/j.seps.2021.101195

Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. http://arxiv.org/abs/1705.07874

Muñetón-Santa, G., & Manrique-Ruiz, L. C. (2023). Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data. Social Sciences, 12(5). https://doi.org/10.3390/socsci12050296

Putra, I. K. P. P. W., Irhamah, Iriawan, N., & Fithriasari, K. (2024). Predicting Poverty Percentage Based on Satellite Imagery and Point of Interest Using Support Vector Regression and Random Forest Regression (Case Study of Central Java Province). In D. Adzkiya & K. Fahim (Eds.), Applied and Computational Mathematics (pp. 309–323). Springer Nature Singapore.

Ramayanti, F., Dodi Vionanda, Dony Permana, & Zilrahmi. (2023). Application of Random Forest to Identify for Poor Households in West Sumatera Province. UNP Journal of Statistics and Data Science, 1(2), 97–104. https://doi.org/10.24036/ujsds/vol1-iss2/31

Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8, e1249. https://doi.org/10.1002/widm.1249

Salvador, E. L. (2024). Use of Boosting Algorithms in Household-Level Poverty Measurement: A Machine Learning Approach to Predict and Classify Household Wealth Quintiles in the Philippines. http://arxiv.org/abs/2407.13061

Satapathy, S., Saravanan, S., Shruti, M., & Mohanty, S. (2023). A Comparative Analysis of Multidimensional COVID-19 Poverty Determinants: An Observational Machine Learning Approach. New Generation Computing, 41. https://doi.org/10.1007/s00354-023-00203-8

Solís-Salazar, M., & Madrigal-Sanabria, J. (2022). Una propuesta de aprendizaje automático para predecir la pobreza. Revista Tecnología En Marcha, 35, 84–94. https://doi.org/10.18845/tm.v35i4.5766

Valentika, N., Notodiputro, K. A., & Sartono, B. (2024). Performance Study of Prediction Intervals with Random Forest for Poverty Data Analysis. Jurnal Aplikasi Statistika & Komputasi Statistik, 16(1), 80–88. https://doi.org/10.34123/jurnalasks.v16i1.542

Wang, N., Zhang, H., Dahal, A., Cheng, W., Zhao, M., & Lombardo, L. (2024). On the use of explainable AI for susceptibility modeling: Examining the spatial pattern of SHAP values. Geoscience Frontiers, 15(4), 101800. https://doi.org/10.1016/j.gsf.2024.101800

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

Zamzuri, M. H. A., Sofian, N., & Hassan, R. (2023). Forecasting of Poverty using the Ensemble Learning Classification Methods. International Journal on Perceptive and Cognitive Computing, 9(1), 24–32. https://doi.org/10.31436/ijpcc.v9i1.326

Zhao, X., Yu, B., Liu, Y., Chen, Z., Li, Q., Wang, C., & Wu, J. (2019). Estimation of poverty using random forest regression with multi-source data: A case study in Bangladesh. Remote Sensing, 11(4), 1–18. https://doi.org/10.3390/rs11040375

Zheng, X., Zhang, W., Deng, H., & Zhang, H. (2024). County-Level Poverty Evaluation Using Machine Learning, Nighttime Light, and Geospatial Data. Remote Sensing, 16(6). https://doi.org/10.3390/rs16060962

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. In J. R. Statist. Soc. B (Vol. 67, Issue 2). https://academic.oup.com/jrsssb/article/67/2/301/7109482

Downloads


Crossmark Updates

How to Cite

Fhaldian, W., & Fahmi, A. (2025). Explainable Machine Learning for Poverty Prediction in Central Java Regencies and Cities. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 9(4), 2080-2097. https://doi.org/10.33395/sinkron.v9i4.15312