
تعداد نشریات | 41 |
تعداد شمارهها | 1,156 |
تعداد مقالات | 9,939 |
تعداد مشاهده مقاله | 18,549,394 |
تعداد دریافت فایل اصل مقاله | 12,863,639 |
تحلیل آماری اخبار جعلی فارسی مربوط به کوید-19 | ||
فصلنامه علمی - پژوهشی زبانشناسی اجتماعی | ||
مقاله 4، دوره 5، شماره 4 - شماره پیاپی 20، مهر 1401، صفحه 43-60 اصل مقاله (887.36 K) | ||
نوع مقاله: مقاله پژوهشی | ||
شناسه دیجیتال (DOI): 10.30473/il.2023.63989.1537 | ||
نویسنده | ||
مسعود قیومی* | ||
استادیار زبانشناسی، پژوهشگاه علوم انسانی و مطالعات فرهنگی، تهران، ایران | ||
چکیده | ||
در این پژوهش تلاش میشود با استفاده از تحلیل آماری، ویژگیهای اخبار جعلی فارسی مربوط به کوید-19 بررسی گردد. برای این هدف، ابتدا یک پیکرۀ زبانی که حاوی اخبار موثّق و جعلی در حوزۀ کرونا است تهیه میشود. سپس الگوهای زبانی این دو دستۀ داده و همچنین دو تحلیل آماری مقدار اطلاعات و خوانایی اخبار موثّق و جعلی مورد بررسی قرار گرفته و با یکدیگر مقایسه میشود. براساس اطلاعات استخراجشده و نتایج عملی بهدستآمده از پیکرۀ خبرهای جعلی، الگوهای زبانی مشترک بین این دو دستۀ داده وجود دارد. همچنین، مقدار اطلاعات در اخبار موثّق براساس دو معیار آنتروپی و شگفتی بیشتر از اخبار جعلی است. سطح خوانایی خبرهای جعلی با استفاده از تساویهای اندازهگیری خوانایی متن مورد ارزیابی قرار گرفتهاست و این نتیجه بهدست آمده است که اخبار جعلی در مقایسه با اخبار موثّق عمدتاً ساده بوده و دشوار نیست. در فرایند برچسبگذاری خودکار خبرهای موثّق و جعلی براساس سطح دشواری حجم زیادی از اخبار جعلی ساده تشخیص داده شدهاست و تعداد کمی از اخبار موثّق با سطح زبانی دشوار بود. علاوهبر این دستاورد و بررسی آماری ویژگیهای زبانی براساس میزان اطلاعات و خوانایی اخبار جعلی، جنبۀ کاربردی این اطلاعات آماری جهت تشخیص خبر جعلی با استفاده از روشهای یادگیری ماشینی مورد مطالعه قرار گرفت. | ||
کلیدواژهها | ||
زبان رسانه؛ اخبار جعلی فارسی؛ کوید-19؛ نظریه اطّلاعات؛ آنتروپی؛ شگفتی؛ خوانایی | ||
عنوان مقاله [English] | ||
A Statistical Analysis of Persian Fake News on COVID-19 | ||
نویسندگان [English] | ||
Masood Ghayoomi | ||
Assistant Professor of Linguistics, Institute for Humanities and Cultural Studies, Tehran, Iran | ||
چکیده [English] | ||
In this research, an attempt is made to investigate the characteristics of Persian fake news related to Covid-19 by using statistical analysis. To this end, first, a language corpus containing reliable and fake news in Persian in the field of Corona is prepared. Then, the language patterns of these two data sets, as well as two statistical analyzes of the amount of information and the readability of reliable and fake news, are examined and compared with each other. According to the exteracted information and the experimental results achieved from the developed corpus on COVID-19 fake news, there are common language patterns in these two datasets. Moreover, the amount of information in reliable news is more than fake news based on two measures of entropy and surprise. Based on the results, the readability level of the fake news is measured based on the readability formulas. According to the results, the text of fake news is simpler than real news. In the process of automatic labeling of reliable and fake news based on the level of difficulty, most news is recognized as simple texts. The results show that fake news is mostly simple and not difficult compared to reliable news. In addition to this achievement, to study linguistic properties of fake news statistically based on the information amount and readability, the applicablity of this statistical information was studied to detect fake news using machine learning methods. | ||
کلیدواژهها [English] | ||
Media Language, Persian Fake News, COVID-19, Information Theory, Entropy, Surprisal, Readability | ||
مراجع | ||
جهانبخشنقده، زلیخا؛ فیضیدرخشی، محمدرضا؛ شریفی، آرش. (1400) ارائه مدلی برای تشخیص شایعات فارسی مبتنی بر تحلیل ویژگیهای محتوایی در متن شبکههای اجتماعی، پردازش علائم و دادهها. ۱۸(۱):۵۰-۲۹.
دیانی، محدحسین. (1366) سه تساوی برای تشخیص سطح خوانایی نوشتههای ویژه نوسوادان، روانشناسی و علوم تربیتی، 39: 59-80.
دیانی، محدحسین. (1369) معیاری برای تعیین سطح خوانایی نوشتههای فارسی، مجله علوم اجتماعی و انسانی، 5: 35-48.
قیومی، مسعود. (1400) تحلیل محتوایی موضوعها و هشتگهای کرونا در رسانههای اجتماعی، علم زبان، دوره 8، ویژهنامه کرونا، فروردین 1400، 8: 87-115.
Ahmed, H., Traore, I., & Saad, S. (2017). Detection of online fake news using n-gram analysis and machine learning techniques. Proceedings of the International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments (pp. 127–138). Springer. Allport, G. W., & Postman, L. (1947). The psychology of rumor. Henry Holt. Beißwenger, M., & Storrer, A. (2008). Corpora of computer-mediated communication, 1, 292–308. Bovet, A., & Makse, H. A. (2019). Influence of fake news in Twitter during the 2016 US presidential election, Nature Communications, 10 (1),1–14. Butler, C. S., & Simon-Vandenbergen, A.M. (2021). Social and physical distance/distancing: A corpus-based analysis of recent changes in usage, Corpus Pragmatics, 5, 427–462 Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (pp. 8440–8451). Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, USA: American Society for Information Science (pp. 1–4). Dale, E., & Chall, J. S. (1948). A formula for predicting readability: Instructions, Educational research bulletin, 37–54. Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171–4186), Minneapolis: Association for Computational Linguistics. DuBay, W. H. (2004). The principles of readability. Impact Information. Flesch, R. (1979). How to write plain English: A book for lawyers and consumer. Harper & Row. Ghayoomi, M. (2022). Application of computational linguistics to predict language proficiency level of Persian learners’ textbooks, Journal of Language Horizons. 6(1), https://lghor.alzahra.ac. ir/article_5408.html Goldani, M. H., Momtazi, S., & Safabakhsh, R. (2020). Detecting fake news with capsule neural networks, Applied Soft Computing, 101, Retrieved online from https://arxiv.org/pdf/2002.01030.pdf Gunning, R. (1952). The technique of clear writing, New York: McGraw-Hill. Hosseini, P., Hosseini, P., & Broniatowski, D. (2020). Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Association for Computational Linguistics. Jahanbakhsh-Nagadeh, Z., Feizi-Derakhshi, M.-R., Ramezani, M., Rahkar-Farshi, T., Asgari-Chenaghlu, M., Nikzad-Khasmakhi, N., Feizi-Derakhshi, A.-R., Ranjbar-Khadivi, M., Zafarani-Moattar, E., & Balafar, M.-A. (2020). A model to measure the spread power of rumors, Retrived online from https://arxiv.org/pdf/2002.07563.pdf Jin, Z., Cao, J., Zhang, Y., Zhou, J., & Tian, Q. (2017). Novel visual and statistical image features for microblogs news verification, IEEE Transactions on Multimedia, 19, 598–608. Jwa, H., Oh, D., Park, K., Kang, J. M., & Lim, H. (2019). exBAKE: Automatic fake news detection model based on bidirectional encoder representations from transformers (BERT), Applied Sciences, 9(19), 4062. Kaliyar, R. K., Goswami, A., Narang, P., & Sinha, S. (2020). Fndnet–a deep convolutional neural network for fake news detection, Cognitive Systems Research, 61, 32-44. Khattar, D., Goud, J. S., Gupta, M., & Varma, V. (2019). MVAE: Multimodal variational autoencoder for fake news detection. Proceedinsg of the World Wide Web Conference, 2915–2921. Kincaid, J. P., Jr, R. P. F., Rogers, R. L., & Chissom, B. S. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel, Institute for Simulation and Training. 56. https://stars.library.ucf.edu/istlibrary/56 Lively, B. A., & Pressey, S. L. (1923). A method for measuring the ‘vocabulary Burden’ of textbooks, Educational administration and supervision, 389–398. Liu, C., Wu, X., Yu, M., Li, G., Jiang, J., Huang, W., & Lu, X. (2019). A two-stage model based on BERT for short fake news detection, In Proceedings of the International Conference on Knowledge Science, Engineering and Management (pp. 172–183). Springer. Lugea, J. (2021). Linguistic approaches to fake news detection (pp. 287–302), Springer. Mahmoodabad, S. D., Farzi, S., & Bakhtiarvand, D. B. (2018). Persian rumor detection on Twitter, In 2018 9th International Symposium on Telecommunications (IST) IEEE, pp. 597–602). Mahmoudi-Dehaki, M., Chalak, A., & Heidari-Tabrizi, H. (2020). The COVID-19 Lingo: Societies’ responses in form of developing a comprehensive Covidipedia of English vs. Persian neologisms (coroneologisms). The Journal of English Language Pedagogy and Practice, 26–52. Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., & Stein, B. (2018). A stylometric inquiry into hyperpartisan and fake news. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia (pp. 231-240). Ramezani, M., Rafiei, M., Omranpour, S., & Rabiee, H. R. (2019). News labeling as early as possible: Real or fake?, In 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 536–537). IEEE. Rubin, V. L., Chen, Y., & Conroy, N. K. (2015). Deception detection for news: Three types of fakes, Proceedings of the Association for Information Science and Technology, 52, 1–4. Samadi, M., Mousavian, M. & Momtazi, S. (2021). Persian fake news detection: A deep neural representation and deep neural learning approach, ACM Transactions on Asian and Low-Resource Language Information Processing, 21. Seifikar, M., Farzi, S., & Mahmoodabad, S. D. (2018). Kermanshah earthquake event tracking through Persian tweets, In the 9th International Symposium on Telecommunications (IST) (pp. 424-428). Shannon, C.E. (1948). A mathematical theory of communication, Bell System Technical Journal, 27, 379-423. Sherman, L. A. (1893). Analytics of literature: A manual for the objective study of English prose and poetry, Athenaeum Press, Ginn. Smith, E. A, & Senter, R. J. (1967). Automated readability index, AMRL-TR. Aerospace Medical Research Laboratories (U.S.), 1-14. Tan, K. H. (2020). Fear’ in COVID-19 fake news: A corpus-based approach, The Southeast Asian Journal of English Language Studies, 26(2), 1-23. Tribus, M. (1961). Thermostatics and Thermodynamics: An introduction to energy, information and states of matter, with engineering applications. D. van Nostrand. Vargo, C., Luo, L., & Amazeen, M.A. (2018). The agenda-setting power of fake news: A big data analysis of the online media landscape from 2014 to 2016, New Media & Society, 20(5), 2028-2049. Vogel, I., & Jiang, P. (2019). Fake news detection with the new German dataset ‘GermanFakeNC’, In A. Doucet, A. Isaac, K. Golub, T. Aalberg, A. Jatowt (eds) Digital libraries for open knowledge: Lecture notes in computer science (vol 11799). Springer, Cham. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online, Science, 359 (6380),1146-1151. Weisser, M. (2016). Practical corpus linguistics: An introduction to corpus-based language analysis. Chichester: Wiley-Blackwell. Yang, S., Shu, K., Wang, S., Gu, R., Wu, F., & Liu, H. (2019). Unsupervised fake news detection on social media: A generative approach, In Proceedings of the AAAI Conference on Artificial Intelligence, 33, 5644–5651. Zamani, S., Asadpour, M., & Moazzami, D. (2017). Rumor detection for Persian tweets, in 2017 Iranian Conference on Electrical Engineering (ICEE) IEE (pp. 1532–1536). Zhang, J., Dong, B., & Philip, S. Y. (2020). Fakedetector: Effective fake news detection with deep diffusive neural network, In 2020 IEEE 36th International Conference on Data Engineering (pp. 1826–1829). IEEE. | ||
آمار تعداد مشاهده مقاله: 454 تعداد دریافت فایل اصل مقاله: 400 |