Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction using K-Nearest Neighbor for Emotion Detection Based on Voice Intonation

Revanto Alif Nawasta; Nur Heri Cahyana; Heriyanto Heriyanto

doi:10.31315/telematika.v20i1.9518

Authors

Revanto Alif Nawasta
Nur Heri Cahyana
Heriyanto Heriyanto

DOI:

https://doi.org/10.31315/telematika.v20i1.9518

Keywords:

Mel-Frequency Cepstral Coefficient, K-Nearest Neighbor, Emotion detection, Signal processing

Abstract

Purpose: To determine emotions based on voice intonation by implementing MFCC as a feature extraction method and KNN as an emotion detection method.

Design/methodology/approach: In this study, the data used was downloaded from several video podcasts on YouTube. Some of the methods used in this study are pitch shifting for data augmentation, MFCC for feature extraction on audio data, basic statistics for taking the mean, median, min, max, standard deviation for each coefficient, Min max scaler for the normalization process and KNN for the method classification.

Findings/result: Because testing is carried out separately for each gender, there are two classification models. In the male model, the highest accuracy was obtained at 88.8% and is included in the good fit model. In the female model, the highest accuracy was obtained at 92.5%, but the model was unable to correctly classify emotions in the new data. This condition is called overfitting. After testing, the cause of this condition was because the pitch shifting augmentation process of one tone in women was unable to solve the problem of the training data size being too small and not containing enough data samples to accurately represent all possible input data values.

Originality/value/state of the art: The research data used in this study has never been used in previous studies because the research data is obtained by downloading from Youtube and then processed until the data is ready to be used for research.

References

Alghifari, M. F., Gunawan, T. S., & Kartiwi, M. (2018). Speech Emotion Recognition Using Deep Feedforward Neural Network. Indonesian Journal of Electrical Engineering and Computer Science, 10(2), 554–561. https://doi.org/10.11591/ijeecs.v10.i2.pp554-561

Al Dujaili, M. J., Ebrahimi-Moghadam, A., & Fatlawi, A. (2021). Speech emotion recognition based on SVM and KNN classifications fusion. International Journal of Electrical and Computer Engineering, 11(2), 1259–1264. https://doi.org/10.11591/ijece.v11i2.pp1259-1264

Helmiyah, S., Riadi, I., Umar, R., & Hanif, A. (2021). Speech Classification to Recognize Emotion Using Artificial Neural Network. Khazanah Informatika: Jurnal Ilmu Komputer Dan Informatika, 7(1), 12–17. https://doi.org/10.23917/khif.v7i1.11913

Liu, G., He, W., & Jin, B. (2018). Feature Fusion of Speech Emotion Recognition Based on Deep Learning. Proceedings of IC-NIDC.

Aini, Y. K., Santoso, T. B., & Dutono, D. T. (2021). Pemodelan CNN Untuk Deteksi Emosi Berbasis Speech Bahasa Indonesia. Jurnal Komputer Terapan, 7(1), 143–152. https://jurnal.pcr.ac.id/index.php/jkt/

Albon, C., 2018. Machine Learning with Python Cookbook. Sebastopol: O’Reilly Media.

Mahardika, Kukuh W., Sari, Yuita A., & Arwan, Achmad. (2018). Optimasi K-Nearest Neighbour Menggunakan Particle Swarm Optimization Optimasi K-Nearest Neighbour Menggunakan Particle Swarm Optimization pada Sistem Pakar untuk Monitoring Pengendalian Hama pada Tanaman Jeruk. Jurnal Teknologi, 2(July), 13.

Hosseini, Z., Ahadi, S. M., & Faraji, N. (2014). Speech Emotion Classification via a Modified Gaussian Mixture Model Approach. 2014 7th International Symposium on Telecommunications, IST 2014, 487–491. https://doi.org/10.1109/ISTEL.2014.7000752

Arifin, C., & Junaedi, H. (2018). Emotion Sound Classification with Support Vector Machine Algorithm. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 3(2), 181–190. https://doi.org/10.22219/kinetik.v3i2.610

Helmiyah, S., Riadi, I., Umar, R., Hanif, A., Yudhana, A., & Fadlil, A. (2020). Identifikasi Emosi Manusia Berdasarkan Ucapan Menggunakan Metode Ekstraksi Ciri LPC dan Metode Euclidean Distance. Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(6), 1177. https://doi.org/10.25126/jtiik.2020722693

Putra, K. T. (2017). Sistem Pengenal Wicara Menggunakan Mel-Frequency Cepstral Coefficient (Speech Recognition System Using Mel-Frequency Cepstral Coefficient). Semesta Teknika, 20(1), 75–80.

Krishna Kishore, K. V., & Krishna Satish, P. (2013). Emotion recognition in speech using MFCC and wavelet features. Proceedings of the 2013 3rd IEEE International Advance Computing Conference, IACC 2013, 842–847. https://doi.org/10.1109/IAdCC.2013.6514336

Heriyanto, H., Hartati, S., & Putra, A. E. (2018). Ekstraksi Ciri Mel Frequency Cepstral Coefficient (Mfcc) Dan Rerata Coefficient Untuk Pengecekan Bacaan Al-Qur’an. Telematika, 15(2), 99. https://doi.org/10.31315/telematika.v15i2.3123

Muljono, Prasetya, M. R., Harjoko, A., & Supriyanto, C. (2019). Speech Emotion Recognition of Indonesian Movie Audio Tracks based on MFCC and SVM. Proceedings of the 4th International Conference on Contemporary Computing and Informatics, IC3I 2019, 22–25. https://doi.org/10.1109/IC3I46837.2019.9055509

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Hermawan, Y. D., Hariadi, V., & Amaliah, B. (2017). Implementasi Algoritma K-Nearest Neighbors dengan Particle Swarm Optimization dalam Klasifikasi Trouble pada Base Transceiver Station ( BTS ). Jurnal Teknik ITS.

Harsemadi, G., Sudarma, M., & Pramaita, N. (2017). Implementasi Algoritma K-Nearest Neighbor pada Perangkat Lunak Pengelompokan Musik untuk Menentukan Suasana Hati. Majalah Ilmiah Teknologi Elektro, 16(1), 14–20. https://doi.org/10.24843/mite.1601.03

Nurcahyo, R., & Iqbal, M. (2022). Pengenalan Emosi Pembicara Menggunakan Convolutional Neural Networks. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 115– 122. https://doi.org/10.29207/resti.v6i1.3726

Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction using K-Nearest Neighbor for Emotion Detection Based on Voice Intonation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section