Machine Learning Analysis of Jakarta Bay Water Quality: Comparing Models

Authors

  • Aura Savira Sistem Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional
  • Andrianingsih Sistem Informasi, Fakultas Teknologi Komunikasi dan Informatika, Universitas Nasional

DOI:

10.33395/sinkron.v10i1.15540

Keywords:

Jakarta Bay, Water Quality Classification, LightGBM, CatBoost, Explanable AI (SHAP)

Abstract

Jakarta Bay experiences persistent anthropogenic pressures that produce spatially heterogeneous water-quality conditions. This study develops a regulation-aligned, explainable classification framework using a 2024 in-situ dataset collected at 53 stations across two sampling periods (March and August). After preprocessing—including unit harmonization, outlier screening, missing-value imputation, and treatment of below-detection-limit measurements—the dataset yielded 104 complete samples classified into Good (n=46), Lightly Polluted (n=28), and Moderately Polluted (n=34) categories based on KEPMEN LH No. 51/2004. Three ensemble algorithms (LightGBM, CatBoost, and Random Forest) were evaluated using stratified cross-validation to maintain class balance and prevent spatial leakage. CatBoost achieved the best overall performance (Accuracy = 0.8338; F1 = 0.8257), followed by Random Forest, while LightGBM showed the highest variability across folds. Class-level metrics indicate that CatBoost produced the most balanced predictions, particularly for the borderline Lightly Polluted class. SHAP analysis identified turbidity/TSS, nutrients, dissolved oxygen, salinity, and spatial gradients as dominant predictors, enabling transparent interpretation of model decisions. The resulting framework provides a reproducible and operationally deployable approach for rapid screening, hotspot detection, and decision support in Jakarta Bay’s water-quality management.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Ardyan, P. A. N. (2025). Water Quality Analysis Using NDTI and TSS Parameters Based on Sentinel Image Data in Jakarta Bay Waters. Maritime Park: Journal of Maritime Technology and Society, 4(June), 103–109. https://doi.org/10.62012/mp.vi.43831

Bai, Y., Xu, Z., Lan, W., Peng, X., Deng, Y., Chen, Z., Xu, H., Wang, Z., Xu, H., Chen, X., & Cheng, J. (2024). Predicting Coastal Water Quality with Machine Learning, a Case Study of Beibu Gulf, China. Water (Switzerland), 16(16), 1–20. https://doi.org/10.3390/w16162253

Chen, B., Chen, Y., & Chen, H. (2024). An Interpretable CatBoost Model Guided by Spectral Morphological Features for the Inversion of Coastal Water Quality Parameters. Water, 16(24), 3615. https://doi.org/10.3390/w16243615

Ding, F., Hao, S., Zhang, W., Jiang, M., Chen, L., Yuan, H., Wang, N., Li, W., & Xie, X. (2025). Using multiple machine learning algorithms to optimize the water quality index model and their applicability. Ecological Indicators, 172, 113299. https://doi.org/10.1016/j.ecolind.2025.113299

Edward, E., & Kusnadi, A. (2023). Review on monitoring of water quality of the Jakarta Bay, Indonesia. E3S Web of Conferences, 454, 1–18. https://doi.org/10.1051/e3sconf/202345402003

Frincu, R. M. (2025). Artificial intelligence in water quality monitoring: A review of water quality assessment applications. Water Quality Research Journal, 60(1), 164–176. https://doi.org/10.2166/wqrj.2024.049

Gharehbaghi, A., Heddam, S., Mehdizadeh, S., & Kim, S. (2025). Development of interpretable intelligent frameworks for estimating river water turbidity. Engineering Applications of Computational Fluid Mechanics, 19(1). https://doi.org/10.1080/19942060.2025.2511886

Hindarto, D. (2022). Perbandingan Kinerja Akurasi Klasifikasi K-NN, NB dan DT pada APK Android. JATISI (Jurnal Teknik Informatika Dan Sistem Informasi), 9(1), 486–503. https://doi.org/10.35957/jatisi.v9i1.1542

Hindarto, D. (2024). Case Study: Gradient Boosting Machine vs Light GBM in Potential Landslide Detection. Journal of Computer Networks, Architecture and High Performance Computing, 6(1), 169–178. https://doi.org/10.47709/cnahpc.v6i1.3374

Hindarto, D., & Santoso, H. (2022). PERFORMANCE COMPARISON OF SUPERVISED LEARNING USING NON-NEURAL NETWORK AND NEURAL NETWORK. Janapati, 11, 49–62.

Lokman, A., Ismail, W. Z. W., & Aziz, N. A. A. (2025). A Review of Water Quality Forecasting and Classification Using Machine Learning Models and Statistical Analysis. Water, 17(15), 2243. https://doi.org/10.3390/w17152243

Makumbura, R. K., Mampitiya, L., Rathnayake, N., Meddage, D. P. P., Henna, S., Dang, T. L., Hoshino, Y., & Rathnayake, U. (2024). Advancing water quality assessment and prediction using machine learning models, coupled with explainable artificial intelligence (XAI) techniques like shapley additive explanations (SHAP) for interpreting the black-box nature. Results in Engineering, 23, 102831. https://doi.org/10.1016/j.rineng.2024.102831

Nishat, M. H., Khan, Md. H. R. B., Ahmed, T., Hossain, S. N., Ahsan, A., El-Sergany, M. M., Shafiquzzaman, Md., Imteaz, M. A., & Alresheedi, M. T. (2025). Comparative analysis of machine learning models for predicting water quality index in Dhaka’s rivers of Bangladesh. Environmental Sciences Europe, 37(1), 31. https://doi.org/10.1186/s12302-025-01078-w

Setiawan, S. (2025). Trends and gaps in ai‐driven predictive models for coastal water quality: A bibliometric study. BIO Web of Conferences, 188, 04004. https://doi.org/10.1051/bioconf/202518804004

Shah, F. U., Khan, A. U., Khan, A. W., Ullah, B., Khan, M. R., & Javed, I. (2024). Comparative analysis of ensemble learning algorithms in water quality prediction. Journal of Hydroinformatics, 26(12), 3041–3059. https://doi.org/10.2166/hydro.2024.071

Singh, P., Hasija, T., Bharany, S., Naeem, H. N. T., Rao, B. C., Hussen, S., & Rehman, A. U. (2025). An ensemble-driven machine learning framework for enhanced water quality classification. Discover Sustainability, 6(1), 552. https://doi.org/10.1007/s43621-025-01467-4

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Safety and Environmental Protection, 169, 808–828. https://doi.org/10.1016/j.psep.2022.11.073

Downloads


Crossmark Updates

How to Cite

Savira, A. ., & Andrianingsih, A. (2026). Machine Learning Analysis of Jakarta Bay Water Quality: Comparing Models. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 110-120. https://doi.org/10.33395/sinkron.v10i1.15540