Comparative analysis of resampling techniques on Machine Learning algorithm


  • Tri Suci Amelia Universitas Labuhanbatu
  • Mila Nirmala Sari Hasibuan Universitas Labuhanbatu, Indonesia
  • Rahmadani Pane Universitas Labuhanbatu, Indonesia




Generally, classification algorithms in the field of data science assume that the classes of training data are equally distributed. However, datasets on real problems often have an unbalanced class distribution. Unbalanced dataset classes make up the majority class and the minority class. In general, minority classes are more attractive and more important to identify. In this case, the correct classification for the minority class sample is more valuable than the majority class. The unbalanced class distribution causes the classification algorithm to have difficulty in classifying minority class samples correctly. If the performance of the algorithm model is good for the majority class sample but bad for the minority class then this imbalance problem is a crucial thing to be addressed. Many solutions are offered for this problem, namely by oversampling techniques in the minority class and/or undersampling techniques in the majority class. In this study, the authors tried various sampling techniques and tested them on various machine learning classification algorithms to find out the combination of resampling techniques and algorithms that have high recall in classifying minority class samples and still considering the majority class classification.

GS Cited Analysis


Download data is not yet available.


Alpaydin, E. (2014). Introduction to Machine Learning (third edition).

Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., … Hussain, A. (2016). Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study. IEEE Access, 4, 7940–7957.

Anand, A., Pugalenthi, G., Fogel, G., & Suganthan, P. (2010). An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids, 39, 1385–1391.

Batista, G., Prati, R., & Monard, M.-C. (2004). A Study of the Behavior of Several Methods for Balancing machine Learning Training Data. SIGKDD Explorations, 6, 20–29.

Bishop, C. M. (2021). Pattern Recognition and Machine Learning. In EAI/Springer Innovations in Communication and Computing.

Burnaev, E., Erofeev, P., & Papanov, A. (2015). Influence of resampling on accuracy of imbalanced classification. Eighth International Conference on Machine Vision (ICMV 2015), 9875, 987521.

Diri, B., & Albayrak, S. (2008). Visualization and analysis of classifiers performance in multi-class medical data. Expert Systems with Applications, 34(1), 628–634.

I., J. M., & M., M. T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.

Liu, A. Y. (2004). The Effect of Oversampling and Understanding on CLassifying Imbalanced Text Datasets.

More, A. (2016). Survey of resampling techniques for improving classification performance in unbalanced datasets. 10000, 1–7. Retrieved from

Pedro, D. (2012). A Few Useful Things to Know About Machine Learning. Communications of the ACM, 55(10), 9–48. Retrieved from

Provost, F. (2000). Machine Learning from Imbalanced Data Sets 101 Extended Abstract.

Rahman, M., & Davis, D. N. (2013). Addressing the Class Imbalance Problem in Medical Datasets. International Journal of Machine Learning and Computing, 3, 224.

Snijders, C., Matzat, U., & Reips, U.-D. (2012). “Big Data” : Big Gaps of Knowledge in the Field of Internet Science. International Journal of Internet Science, 7, 1–5.

Statistic Solutions. (2016). Resampling. Retrieved April 10, 2022, from website:

Visa, S., & Ralescu, A. (2005). Issues in Mining Imbalanced Data Sets - A Review Paper. Proc. 16th Midwest Artificial Intelligence and Cognitive Science Conference.

Yen, S.-J., & Lee, Y.-S. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset BT - Intelligent Control and Automation: International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16–19, 2006 (D.-S. Huang, K. Li, & G. W. Irwin, Eds.). Berlin, Heidelberg: Springer Berlin Heidelberg.


Crossmark Updates

How to Cite

Amelia, T. S., Hasibuan, M. N. S. ., & Pane, R. . (2022). Comparative analysis of resampling techniques on Machine Learning algorithm. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 6(2), 628-634.

Most read articles by the same author(s)