
تعداد نشریات | 41 |
تعداد شمارهها | 1,164 |
تعداد مقالات | 10,047 |
تعداد مشاهده مقاله | 18,779,108 |
تعداد دریافت فایل اصل مقاله | 13,034,419 |
Emotion Recognition for Persian Speech Using Convolutional Neural Network and Support Vector Machine | ||
Control and Optimization in Applied Mathematics | ||
مقاله 6، دوره 8، شماره 2، اسفند 2023، صفحه 85-105 اصل مقاله (1.25 M) | ||
نوع مقاله: Research Article | ||
شناسه دیجیتال (DOI): 10.30473/coam.2023.66718.1226 | ||
نویسندگان | ||
Saeed Hashemi؛ Saeed Ayat* | ||
Department of Computer Engineering and Information Technology, Payame Noor University (PNU), Tehran, Iran | ||
چکیده | ||
The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools. To address this, we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy. The method involves four steps: preprocessing, feature description, feature extraction, and classification. The input signal is normalized in the preprocessing step using single-channel vector conversion and signal resampling. Feature descriptions are performed using Mel-Frequency Cepstral Coefficients and Spectro-Temporal Modulation techniques, which produce separate feature matrices. These matrices are then merged and used for feature extraction through a Convolutional Neural Network. Finally, a Support Vector Machine with a linear kernel function is used for emotion classification. The proposed method is evaluated using the Sharif Emotional Speech dataset and achieves an average accuracy of 80.9% in classifying emotions in Persian speech. | ||
کلیدواژهها | ||
Emotion recognition in speech؛ Mel-Frequency cepstral coefficients؛ Convolutional neural network؛ Support vector machine | ||
مراجع | ||
[1] Alabsi, A., Gong, W., Hawbani, A. (2022). “Emotion recognition based on wireless, physiological and audiovisual signals: A comprehensive survey”, In International Conference on Smart Computing and Cyber Security: Strategic Foresight, Security Challenges and Innovation, 121-138.
[2] Alghifari, M.F., Gunawan, T.S., Kartiwi, M. (2018). “Speech emotion recognition using deep feedforward neural network”, Indonesian Journal of Electrical Engineering and Computer Science, 10 (2), 554-561.
[3] Badie, A., Moragheb, M.A., Noshad, A. (2021). “An efficient approach to mental sentiment classification with EEG-based signals using LSTM neural network”, Control and Optimization in Applied Mathematics, 6 (1).
[4] Edraki, A., Chan, W.Y. G., Jensen, J., Fogerty, D. (2019). “Improvement and assessment of spectro-temporal modulation analysis for speech intelligibility estimation”, In Interspeech 2019, 1378-1382.
[5] Edraki, A., Chan, W.Y., Jensen, J., Fogerty, D. (2022). “Spectro-temporal modulation glimpsing for speech intelligibility prediction”, Hearing Research, 108620.
[6] Fahad, M., Deepak, A., Pradhan, G., Yadav, J. (2021). “DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features”, Circuits, Systems, and Signal Processing, 40 (1), 466-489.
[7] Horkous, H., Guerti, M. (2021). “Recognition of anger and neutral emotions in speech with different languages”, International Journal of Computing and Digital Systems, 10, 563-574.
[8] Hossin, M., Sulaiman, M.N. (2015). “A review on evaluation metrics for data classification evaluations”, International Journal of Data Mining & Knowledge Management Process (IJDKP), 5, 3-9.
[9] Ke, X., Zhu, Y., Wen, L., Zhang, W. (2018). “Speech emotion recognition based on SVM and ANN”, International Journal of Machine Learning and Computing, 8 (3), 198-202.
[10] Kumbhar, H.S., Bhandari, S.U. (2019). “Speech emotion recognition using MFCC features and LSTM network”, In 2019, 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), 1-3.
[11] Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M. (2021). “Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence”, Information Sciences, 563, 309-325.
[12] Nezami, M.O., Jamshid Lou, P., Karami, M. (2019). “ShEMO: A Large-Scale Validated Database for Persian Speech Emotion Detection”, Language Resources & Evaluation.
[13] Panagakis, Y., Kotropoulos, C., Arce, G.R. (2009). “Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification”, IEEE Transactions on Audio, Speech, and Language Processing, 18 (3), 576-588.
[14] Pisner, D.A., Schnyer, D.M. (2020). “Support vector machine”, In Machine Learning, 101-121, Academic Press.
[15] Ravanbakhsh, M., Setayeshi, S., Pedram, M.M., Mirzaei, A. (2020). “Evaluation of implicit emotion in the message through emotional speech processing based on Mel-frequency Cepstral coefficient and short-time Fourier transform features”, Advances in Cognitive Science, 22 (2), 71-81.
[16] Siadat, S.R., Voronkov, I.M., Kharlamov, A.A. (2022). “Emotion recognition from Persian speech with 1D Convolution neural network”, In 2022 Fourth International Conference Neurotechnologies and Neurointerfaces (CNN), 152-157.
[17] Tiwari, P., Darji, A.D. (2022). “A novel S-LDA features for automatic emotion recognition from speech using 1-D CNN”, International Journal of Mathematical, Engineering and Management Sciences, 7 (1), 49.
[18] Yadav, S.P., Zaidi, S., Mishra, A., Yadav, V. (2022). “Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN)”, Archives of Computational Methods in Engineering, 29 (3), 1753-1770. | ||
آمار تعداد مشاهده مقاله: 334 تعداد دریافت فایل اصل مقاله: 255 |