Human Age Estimation Through Audio Utilising MFCC and RNN
DOI:
10.33395/sinkron.v8i3.12656Keywords:
Classification, Age Estimation, Audio, MFCC, RNNAbstract
Age is one of human main attributes. Age is important factor to improve communication experience. Age estimation has been used in several applications to improve user experience. Therefore, an approach is needed to estimate the user age, one of which is through audio. In this study, Mel Frequency Cepstrum Coefficients (MFCC) and Recurrent Neural Network (RNN) will be used to estimate age through audio. MFCC is used to get features from audio data, while RNN is used to estimate age. Dataset used here was taken from corpus of user speech data on the Common Voice website. This study shows that MFCC and RNN methods are able to estimate human age through audio with highest accuracy obtained in SimpleRNN is 0.5647, and 0.7087 in LSTM.
Downloads
References
Abdulsatar, A. A., Davydov, V. V., Yushkova, V. V., Glinushkin, A. P., & Rud, V. Y. (2019). Age and Gender Recognition From Speech Signals. Journal of Physics: Conference Series, 1410(1), 0–7. https://doi.org/10.1088/1742-6596/1410/1/012073
Alwi, A. A., Adikara, P. P., & Indriati. (2020). Pengenalan Jenis Kelamin dan Rentang Umur berdasarkan Suara menggunakan Metode Backpropagation Neural Network. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 4(7), 2083–2093.
Apaydin, H., Feizi, H., Sattari, M. T., Colak, M. S., Shamshirband, S., & Chau, K. W. (2020). Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water (Switzerland), 12(5), 1–18. https://doi.org/10.3390/w12051500
Chauhan, N., Isshiki, T., & Li, D. (2019). Speaker Recognition Using LPC, MFCC, ZCR Features With ANN and SVM Classifier for Large Input Database. 2019 IEEE 4th International Conference on Computer and Communication Systems, ICCCS 2019, 130–133. https://doi.org/10.1109/CCOMS.2019.8821751
Dixon, M. F. (2018). Sequence Classification of the Limit Order Book Using Recurrent Neural Networks. SSRN Electronic Journal, 1–20. https://doi.org/10.2139/ssrn.3002814
Hameed, M., & Naumann, F. (2020). Data Preparation: A Survey of Commercial Tools. ACM SIGMOD Record, 49(3), 18–29. https://doi.org/10.1145/3444831.3444835
Mahmoodi, D., Marvi, H., Taghizadeh, M., Soleimani, A., Razzazi, F., & Mahmoodi, M. (2011). Age estimation based on speech features and support vector machine. 2011 3rd Computer Science and Electronic Engineering Conference, CEEC’11, (May 2014), 60–64. https://doi.org/10.1109/CEEC.2011.5995826
Martinez, J., Perez, H., Escamilla, E., & Suzuki, M. M. (2012). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques. In CONIELECOMP 2012, 22nd International Conference on Electrical Communications and Computers (pp. 248–251). IEEE. https://doi.org/10.1109/CONIELECOMP.2012.6189918
Raza, A., Mehmood, A., Ullah, S., Ahmad, M., Choi, G. S., & On, B. W. (2019). Heartbeat Sound Signal Classification Using Deep Learning. Sensors (Switzerland), 19(21), 1–15. https://doi.org/10.3390/s19214819
Sánchez-Hevia, H. A., Gil-Pita, R., Utrilla-Manso, M., & Rosa-Zurera, M. (2019). Convolutional-recurrent Neural Network for Age and Gender Prediction From Speech. 2019 Signal Processing Symposium, SPSympo 2019, 242–245. https://doi.org/https://doi.org/10.1109/SPS.2019.8881961
Singh, P. P., & Rani, P. (2014). An Approach to Extract Feature Using MFCC. IOSR Journal of Engineering, 4(8), 21–25. https://doi.org/10.2307/j.ctt46nrzt.12
Spiegl, W., Stemmer, G., Lasarcyk, E., Kolhatkar, V., Cassidy, A., Potard, B., … Nöth, E. (2009). Analyzing Features for Automatic Age Estimation on Cross-sectional Data. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2923–2926. https://doi.org/10.21437/interspeech.2009-740
Tridarma, P., & Endah, S. N. (2020). Pengenalan Ucapan Bahasa Indonesia Menggunakan MFCC dan Recurrent Neural Network. Jurnal Masyarakat Informatika, 11(2), 36–44.
Wiranda, L., & Sadikin, M. (2019). Penerapan Long Short Term Memory Pada Data Time Series Untuk Memprediksi Penjualan Produk Pt. Metiska Farma. Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), 8(3), 184–196.
Wu, W., Han, F., Song, G., & Wang, Z. (2019). Music Genre Classification Using Independent Recurrent Neural Network. Proceedings 2018 Chinese Automation Congress, CAC 2018, 192–195. https://doi.org/10.1109/CAC.2018.8623623
Zaghbani, S., Boujneh, N., & Bouhlel, M. S. (2018). Age Estimation Using Deep Learning. Computers and Electrical Engineering, 68(October 2017), 337–347. https://doi.org/10.1016/j.compeleceng.2018.04.012
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2023 Wenripin Chandra, Ken Ken, Osfredo Quinn, Irpan Adiputra Pardosi
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.