|تعداد مشاهده مقاله||15,587,864|
|تعداد دریافت فایل اصل مقاله||10,890,684|
|Control and Optimization in Applied Mathematics|
|مقاله 6، دوره 8، شماره 2، اسفند 2023، صفحه 85-105 اصل مقاله (1.25 M)|
|نوع مقاله: Research Article|
|شناسه دیجیتال (DOI): 10.30473/coam.2023.66718.1226|
|Saeed Hashemi؛ Saeed Ayat*|
|Department of Computer Engineering and Information Technology, Payame Noor University (PNU), Tehran, Iran|
|The paper discusses the limitations of emotion recognition in Persian speech due to inefficient feature extraction and classification tools. To address this, we propose a new method for detecting hidden emotions in Persian speech with higher recognition accuracy. The method involves four steps: preprocessing, feature description, feature extraction, and classification. The input signal is normalized in the preprocessing step using single-channel vector conversion and signal resampling. Feature descriptions are performed using Mel-Frequency Cepstral Coefficients and Spectro-Temporal Modulation techniques, which produce separate feature matrices. These matrices are then merged and used for feature extraction through a Convolutional Neural Network. Finally, a Support Vector Machine with a linear kernel function is used for emotion classification. The proposed method is evaluated using the Sharif Emotional Speech dataset and achieves an average accuracy of 80.9% in classifying emotions in Persian speech.|
|Emotion recognition in speech؛ Mel-Frequency cepstral coefficients؛ Convolutional neural network؛ Support vector machine|
 Alabsi, A., Gong, W., Hawbani, A. (2022). “Emotion recognition based on wireless, physiological and audiovisual signals: A comprehensive survey”, In International Conference on Smart Computing and Cyber Security: Strategic Foresight, Security Challenges and Innovation, 121-138.
 Alghifari, M.F., Gunawan, T.S., Kartiwi, M. (2018). “Speech emotion recognition using deep feedforward neural network”, Indonesian Journal of Electrical Engineering and Computer Science, 10 (2), 554-561.
 Badie, A., Moragheb, M.A., Noshad, A. (2021). “An efficient approach to mental sentiment classification with EEG-based signals using LSTM neural network”, Control and Optimization in Applied Mathematics, 6 (1).
 Edraki, A., Chan, W.Y. G., Jensen, J., Fogerty, D. (2019). “Improvement and assessment of spectro-temporal modulation analysis for speech intelligibility estimation”, In Interspeech 2019, 1378-1382.
 Edraki, A., Chan, W.Y., Jensen, J., Fogerty, D. (2022). “Spectro-temporal modulation glimpsing for speech intelligibility prediction”, Hearing Research, 108620.
 Fahad, M., Deepak, A., Pradhan, G., Yadav, J. (2021). “DNN-HMM-based speaker-adaptive emotion recognition using MFCC and epoch-based features”, Circuits, Systems, and Signal Processing, 40 (1), 466-489.
 Horkous, H., Guerti, M. (2021). “Recognition of anger and neutral emotions in speech with different languages”, International Journal of Computing and Digital Systems, 10, 563-574.
 Hossin, M., Sulaiman, M.N. (2015). “A review on evaluation metrics for data classification evaluations”, International Journal of Data Mining & Knowledge Management Process (IJDKP), 5, 3-9.
 Ke, X., Zhu, Y., Wen, L., Zhang, W. (2018). “Speech emotion recognition based on SVM and ANN”, International Journal of Machine Learning and Computing, 8 (3), 198-202.
 Kumbhar, H.S., Bhandari, S.U. (2019). “Speech emotion recognition using MFCC features and LSTM network”, In 2019, 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), 1-3.
 Liu, Z.T., Rehman, A., Wu, M., Cao, W.H., Hao, M. (2021). “Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence”, Information Sciences, 563, 309-325.
 Nezami, M.O., Jamshid Lou, P., Karami, M. (2019). “ShEMO: A Large-Scale Validated Database for Persian Speech Emotion Detection”, Language Resources & Evaluation.
 Panagakis, Y., Kotropoulos, C., Arce, G.R. (2009). “Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification”, IEEE Transactions on Audio, Speech, and Language Processing, 18 (3), 576-588.
 Pisner, D.A., Schnyer, D.M. (2020). “Support vector machine”, In Machine Learning, 101-121, Academic Press.
 Ravanbakhsh, M., Setayeshi, S., Pedram, M.M., Mirzaei, A. (2020). “Evaluation of implicit emotion in the message through emotional speech processing based on Mel-frequency Cepstral coefficient and short-time Fourier transform features”, Advances in Cognitive Science, 22 (2), 71-81.
 Siadat, S.R., Voronkov, I.M., Kharlamov, A.A. (2022). “Emotion recognition from Persian speech with 1D Convolution neural network”, In 2022 Fourth International Conference Neurotechnologies and Neurointerfaces (CNN), 152-157.
 Tiwari, P., Darji, A.D. (2022). “A novel S-LDA features for automatic emotion recognition from speech using 1-D CNN”, International Journal of Mathematical, Engineering and Management Sciences, 7 (1), 49.
 Yadav, S.P., Zaidi, S., Mishra, A., Yadav, V. (2022). “Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN)”, Archives of Computational Methods in Engineering, 29 (3), 1753-1770.
تعداد مشاهده مقاله: 81
تعداد دریافت فایل اصل مقاله: 16