Comparing BDD and TDD: Machine Learning Analysis of Software Quality with SHAP Interpretability

Authors

  • Gregorius Airlangga Information System Study Program, Atma Jaya Catholic University of Indonesia

DOI:

10.33395/sinkron.v8i4.14201

Keywords:

Behavior-Driven Development (BDD), Test-Driven Development (TDD), Software Quality, Machine Learning Models, SHAP Interpretability

Abstract

This study evaluates the impact of Behavior-Driven Development (BDD) and Test-Driven Development (TDD) on software quality using machine learning models, including Random Forest, XGBoost, and LightGBM. Key metrics such as bug detection, test coverage, and development time were analyzed using a dataset from multiple software projects. Polynomial feature expansion captured non-linear interactions, while SHapley Additive exPlanations (SHAP) enhanced interpretability. Results indicate that Random Forest achieved the best predictive accuracy, with an average RMSE of 7.64 and MAE of 6.39, outperforming XGBoost (average RMSE: 8.63, MAE: 7.37) and LightGBM (average RMSE: 6.89, MAE: 5.38). However, negative  values across all models reveal challenges in generalization. SHAP analysis highlights the critical influence of higher-order interactions, particularly between test coverage and development time. These findings underscore the complexity of predicting software quality and suggest the need for additional features and advanced techniques to enhance model performance. This study provides a comprehensive, interpretable framework for assessing the comparative effectiveness of BDD and TDD in improving software quality.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Ahonen, A., de Koning, M., Machado, T., Ghabcheloo, R., & Sievi-Korte, O. (2023). An exploratory study of software engineering in heavy-duty mobile machine automation. Robotics and Autonomous Systems, 165, 104424.

Alkadhim, H. A., Amin, M. N., Ahmad, W., Khan, K., Nazar, S., Faraz, M. I., & Imran, M. (2022). Evaluating the strength and impact of raw ingredients of cement mortar incorporating waste glass powder using machine learning and SHapley additive ExPlanations (SHAP) methods. Materials, 15(20), 7344.

BHATT, P. C. P. (2021). Software design, architecture and engineering: Concepts and practice. PHI Learning Pvt. Ltd.

Bi, Y., Xiang, D., Ge, Z., Li, F., Jia, C., & Song, J. (2020). An interpretable prediction model for identifying N7-methylguanosine sites based on XGBoost and SHAP. Molecular Therapy-Nucleic Acids, 22, 362–372.

Cheng, X., Liu, N., Guo, L., Xu, Z., & Zhang, T. (2020). Blocking bug prediction based on XGBoost with enhanced features. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), 902–911.

Colakoglu, F. N., Yazici, A., & Mishra, A. (2021). Software product quality metrics: A systematic mapping study. IEEE Access, 9, 44647–44670.

Contieri, M. (2023). Clean Code Cookbook: Recipes to Improve the Design and Quality of Your Code. “ O’Reilly Media, Inc.”

Ferenc, R., Bán, D., Grósz, T., & Gyimóthy, T. (2020). Deep learning in static, metric-based bug prediction. Array, 6, 100021.

Güncan, D., & Onay Durdu, P. (2021). A user-centered behavioral software development model. Journal of Software: Evolution and Process, 33(2), e2274.

Krauß, V., Boden, A., Oppermann, L., & Reiners, R. (2021). Current practices, challenges, and design implications for collaborative AR/VR application development. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–15.

Latendresse, J., Abedu, S., Abdellatif, A., & Shihab, E. (2024). An Exploratory Study on Machine Learning Model Management. ACM Transactions on Software Engineering and Methodology.

Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., … Huda, S. (2023). A novel approach to improve software defect prediction accuracy using machine learning. IEEE Access, 11, 63579–63597.

Mishra, A., & Otaiwi, Z. (2020). DevOps and software quality: A systematic mapping. Computer Science Review, 38, 100308.

Motwani, M., Soto, M., Brun, Y., Just, R., & Le Goues, C. (2020). Quality of automated program repair on real-world defects. IEEE Transactions on Software Engineering, 48(2), 637–661.

Myllynen Webb, K. (2023). Mixed-method research approaches within non-governmental programmes to improve maternal and child health in Zimbabwe. London School of Hygiene & Tropical Medicine.

Nascimento, N. P. do. (2020). A study of teaching BDD in active learning environments. Pontif{’i}cia Universidade Católica do Rio Grande do Sul.

Nugroho, I. (2023). Elevating Software Development Frameworks: Harnessing Automation Testing to Drive Continuous Improvement and Quality Assurance. Sage Science Review of Educational Technology, 6(1), 101–136.

Osinga, S. A., Paudel, D., Mouzakitis, S. A., & Athanasiadis, I. N. (2022). Big data in agriculture: Between opportunity and solution. Agricultural Systems, 195, 103298.

Österholm, V. (2021). Overview of Behaviour-Driven Development tools for web applications.

Pai, A. R., Joshi, G., & Rane, S. (2021). Quality and reliability studies in software defect management: a literature review. International Journal of Quality & Reliability Management, 38(10), 2007–2033.

Papastergiou, S., Kalogeraki, E.-M., Polemi, N., & Douligeris, C. (2021). Challenges and issues in risk assessment in modern maritime systems. Advances in Core Computer Science-Based Technologies: Papers in Honor of Professor Nikolaos Alexandris, 129–156.

Parsa, S. (n.d.). Software Testing Automation.

Parsa, S., Zakeri-Nasrabadi, M., & Turhan, B. (2025). Testability-driven development: An improvement to the TDD efficiency. Computer Standards & Interfaces, 91, 103877.

Pratama, Y. (2024). TDD and BDD Comparison. Retrieved from https://www.kaggle.com/datasets/yogi2727/tdd-and-bdd-comparison/data

Rahman, S., & Nadia, F. (2024). Pioneering Testing Technologies: Advancing Software Quality Through Innovative Methodologies and Frameworks. Journal of Artificial Intelligence and Machine Learning in Management, 8(2), 44–70.

Robson, C. (2024). Real world research. John Wiley & Sons.

Rocha Silva, T., Winckler, M., & Bach, C. (2020). Evaluating the usage of predefined interactive behaviors for writing user stories: an empirical study with potential product owners. Cognition, Technology & Work, 22(3), 437–457.

Roman, A., & Mnich, M. (2021). Test-driven development with mutation testing--an experimental study. Software Quality Journal, 29, 1–38.

Romano, S., Zampetti, F., Baldassarre, M. T., Di Penta, M., & Scanniello, G. (2022). Do static analysis tools affect software quality when using test-driven development? Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 80–91.

Saure, W. (2024). Do tests really enable change? On the relationship between unit test coverage and maintainablity of production code.

Schäfer, A., Reis, G., & Stricker, D. (2022). A survey on synchronous augmented, virtual, andmixed reality remote collaboration systems. ACM Computing Surveys, 55(6), 1–27.

Shivashankar, K., Orucevic, M., Kruke, M. M., & Martini, A. (2024). Identifying Technical Debt and Its Types Across Diverse Software Projects Issues. ArXiv Preprint ArXiv:2408.09128.

Silva, D. I., & Siriwardana, L. K. B. (2023). Comparative Analysis of Software Quality Assurance Approaches in Development Models.

Smart, J. F., & Molak, J. (2023). BDD in Action: Behavior-driven development for the whole software lifecycle. Simon and Schuster.

Tibon, R., Geerligs, L., & Campbell, K. (2022). Bridging the big (data) gap: levels of control in small-and large-scale cognitive neuroscience research. Trends in Neurosciences, 45(7), 507–516.

Vindrola-Padros, C., & Johnson, G. A. (2020). Rapid techniques in qualitative research: a critical review of the literature. Qualitative Health Research, 30(10), 1596–1604.

Yang, Y., Xia, X., Lo, D., & Grundy, J. (2022). A survey on deep learning for software engineering. ACM Computing Surveys (CSUR), 54(10s), 1–73.

Zaeske, W., & Durak, U. (n.d.). DevOps for Airborne Software.

Zhong, S., Zhang, K., Bagheri, M., Burken, J. G., Gu, A., Li, B., … others. (2021). Machine learning: new ideas and tools in environmental science and engineering. Environmental Science & Technology, 55(19), 12741–12754.

Downloads


Crossmark Updates

How to Cite

Airlangga, G. (2024). Comparing BDD and TDD: Machine Learning Analysis of Software Quality with SHAP Interpretability. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(4), 2615-2625. https://doi.org/10.33395/sinkron.v8i4.14201