Human Age Estimation Through Audio Utilising MFCC and RNN

Authors

  • Ken Ken Universitas Mikroskil
  • Osfredo Quinn Universitas Mikroskil
  • Irpan Adiputra Pardosi Universitas Mikroskil
  • Wenripin Chandra Universitas Pelita Harapan

DOI:

10.33395/sinkron.v8i3.12656

Keywords:

Classification, Age Estimation, Audio, MFCC, RNN

Abstract

Age is one of human main attributes. Age is important factor to improve communication experience. Age estimation has been used in several applications to improve user experience. Therefore, an approach is needed to estimate the user age, one of which is through audio. In this study, Mel Frequency Cepstrum Coefficients (MFCC) and Recurrent Neural Network (RNN) will be used to estimate age through audio. MFCC is used to get features from audio data, while RNN is used to estimate age. Dataset used here was taken from corpus of user speech data on the Common Voice website. This study shows that MFCC and RNN methods are able to estimate human age through audio with highest accuracy obtained in SimpleRNN is 0.5647, and 0.7087 in LSTM.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Abdulsatar, A. A., Davydov, V. V., Yushkova, V. V., Glinushkin, A. P., & Rud, V. Y. (2019). Age and Gender Recognition From Speech Signals. Journal of Physics: Conference Series, 1410(1), 0–7. https://doi.org/10.1088/1742-6596/1410/1/012073

Alwi, A. A., Adikara, P. P., & Indriati. (2020). Pengenalan Jenis Kelamin dan Rentang Umur berdasarkan Suara menggunakan Metode Backpropagation Neural Network. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 4(7), 2083–2093.

Apaydin, H., Feizi, H., Sattari, M. T., Colak, M. S., Shamshirband, S., & Chau, K. W. (2020). Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water (Switzerland), 12(5), 1–18. https://doi.org/10.3390/w12051500

Chauhan, N., Isshiki, T., & Li, D. (2019). Speaker Recognition Using LPC, MFCC, ZCR Features With ANN and SVM Classifier for Large Input Database. 2019 IEEE 4th International Conference on Computer and Communication Systems, ICCCS 2019, 130–133. https://doi.org/10.1109/CCOMS.2019.8821751

Dixon, M. F. (2018). Sequence Classification of the Limit Order Book Using Recurrent Neural Networks. SSRN Electronic Journal, 1–20. https://doi.org/10.2139/ssrn.3002814

Hameed, M., & Naumann, F. (2020). Data Preparation: A Survey of Commercial Tools. ACM SIGMOD Record, 49(3), 18–29. https://doi.org/10.1145/3444831.3444835

Mahmoodi, D., Marvi, H., Taghizadeh, M., Soleimani, A., Razzazi, F., & Mahmoodi, M. (2011). Age estimation based on speech features and support vector machine. 2011 3rd Computer Science and Electronic Engineering Conference, CEEC’11, (May 2014), 60–64. https://doi.org/10.1109/CEEC.2011.5995826

Martinez, J., Perez, H., Escamilla, E., & Suzuki, M. M. (2012). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques. In CONIELECOMP 2012, 22nd International Conference on Electrical Communications and Computers (pp. 248–251). IEEE. https://doi.org/10.1109/CONIELECOMP.2012.6189918

Raza, A., Mehmood, A., Ullah, S., Ahmad, M., Choi, G. S., & On, B. W. (2019). Heartbeat Sound Signal Classification Using Deep Learning. Sensors (Switzerland), 19(21), 1–15. https://doi.org/10.3390/s19214819

Sánchez-Hevia, H. A., Gil-Pita, R., Utrilla-Manso, M., & Rosa-Zurera, M. (2019). Convolutional-recurrent Neural Network for Age and Gender Prediction From Speech. 2019 Signal Processing Symposium, SPSympo 2019, 242–245. https://doi.org/https://doi.org/10.1109/SPS.2019.8881961

Singh, P. P., & Rani, P. (2014). An Approach to Extract Feature Using MFCC. IOSR Journal of Engineering, 4(8), 21–25. https://doi.org/10.2307/j.ctt46nrzt.12

Spiegl, W., Stemmer, G., Lasarcyk, E., Kolhatkar, V., Cassidy, A., Potard, B., … Nöth, E. (2009). Analyzing Features for Automatic Age Estimation on Cross-sectional Data. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2923–2926. https://doi.org/10.21437/interspeech.2009-740

Tridarma, P., & Endah, S. N. (2020). Pengenalan Ucapan Bahasa Indonesia Menggunakan MFCC dan Recurrent Neural Network. Jurnal Masyarakat Informatika, 11(2), 36–44.

Wiranda, L., & Sadikin, M. (2019). Penerapan Long Short Term Memory Pada Data Time Series Untuk Memprediksi Penjualan Produk Pt. Metiska Farma. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 8(3), 184–196.

Wu, W., Han, F., Song, G., & Wang, Z. (2019). Music Genre Classification Using Independent Recurrent Neural Network. Proceedings 2018 Chinese Automation Congress, CAC 2018, 192–195. https://doi.org/10.1109/CAC.2018.8623623

Zaghbani, S., Boujneh, N., & Bouhlel, M. S. (2018). Age Estimation Using Deep Learning. Computers and Electrical Engineering, 68(October 2017), 337–347. https://doi.org/10.1016/j.compeleceng.2018.04.012

Downloads


Crossmark Updates

How to Cite

Ken Ken, Quinn, O., Pardosi, I. A., & Chandra, W. (2023). Human Age Estimation Through Audio Utilising MFCC and RNN. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 7(3), 1852-1862. https://doi.org/10.33395/sinkron.v8i3.12656