Predicting content-based political inclinations of Iranian Twitter users using BERT and deep learning

Document Type : Original Article

Authors

Faculty of Computer Engineering, Malek-Ashtar University of Technology, Tehran, Iran

Abstract

Along with the advent of social networks such as Twitter; Politicians, social media, and ordinary citizens regularly turn to them to share their thoughts and feelings, such as political views. This article analyzes the political ideology of Iranian Twitter users using deep learning and combining the deep layers of LSTM and CNN with BERT, enabling us to target groups of sympathizers and opponents of the Islamic Republic of Iran that is of particular interest to political scientists. We trained a model for predicting whether a tweet is a sympathizer or opponent, using a novel dataset from Twitter, including tweets from sympathizers and opponent people. Then, using the trained model, the people’s ideology can be identified. The results show that using the proposed model, tweets can be categorized with a 75.68% F1-Score, and the classification of individuals based on political orientation to a 93.18% F1-Score can be done correctly.

Keywords


[1] F. Al Zamal, W. Liu, and D. Ruths, Homophily and latent attribute inference: Inferring latent attributes of twitter users from neighbors, in Sixth International AAAI Conference on Weblogs and Social Media, 2012.
[2] D. Azucar, D. Marengo, and M. Settanni, Predicting the big 5 personality traits from digital footprints on social media: A meta-analysis, Personality and individual differences, 124 (2018), pp. 150–159.
[3] P. Burnap, R. Gibson, L. Sloan, R. Southern, and M. Williams, 140 characters to victory?: Using twitter to predict the uk 2015 general election, Electoral Studies, 41 (2016), pp. 230–233.
[4] B. Buy¨ uk¨ oz, A. H ¨ urriyeto ¨ glu, and A. ˘ Ozg ¨ ur¨ , Analyzing elmo and distilbert on socio-political news classification, in Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, 2020, pp. 9–18.
[5] M. Campanale and E. G. Caldarola, Revealing political sentiment with twitter: the case study of the 2016 italian constitutional referendum, in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018, pp. 861–868.
[6] M. Cardaioli, P. Kaliyar, P. Capuozzo, M. Conti, G. Sartori, and M. Monaro, Predicting twitter users’ political orientation: An application to the italian political scenario, in 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2020, pp. 159–165.
[7] M. D. Conover, B. Gonc¸alves, J. Ratkiewicz, A. Flammini, and F. Menczer, Predicting the political alignment of twitter users, in 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, IEEE, 2011, pp. 192–199.
[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, June 2019, Association for Computational Linguistics, pp. 4171–4186.
[9] M. Di Giovanni, M. Brambilla, S. Ceri, F. Daniel, and G. Ramponi, Content-based classification of political inclinations of twitter users, in 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 4321–4327.
[10] J. DiGrazia, K. McKelvey, J. Bollen, and F. Rojas, More tweets, more votes: Social media as a quantitative indicator of political behavior, PloS one, 8 (2013), p. e79449.
[11] W. Falcon et al., Pytorch lightning, GitHub. Note: https://github.com/PyTorchLightning/pytorchlightning, 3 (2019).
[12] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, Parsbert: Transformer-based model for persian language understanding, Neural Processing Letters, 53 (2021), pp. 3831–3847.
[13] S. Gupta, S. Bolden, J. Kachhadia, A. Korsunska, and J. Stromer-Galley, Polibert: Classifying political social media messages with bert, in Social, Cultural and Behavioral Modeling (SBP-BRIMS 2020) conference. Washington, DC, 2020.
[14] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, (2014).
[15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692, (2019).
[16] J. C. A. D. Lopez, S. Collignon-Delmar, K. Benoit, and A. Matsuo, Predicting the brexit vote by tracking and classifying public opinion using twitter data, Statistics, Politics and Policy, 8 (2017), pp. 85–104.
[17] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781, (2013).
[18] S. M. Mohammad, X. Zhu, S. Kiritchenko, and J. Martin, Sentiment, emotion, purpose, and style in electoral tweets, Information Processing & Management, 51 (2015), pp. 480–499.
[19] M. Pennacchiotti and A.-M. Popescu, Democrats, republicans and starbucks afficionados: user classification in twitter, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 430–438.
[20] J. Pennington, R. Socher, and C. Manning, GloVe: Global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Oct. 2014, Association for Computational Linguistics, pp. 1532–1543.
[21] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, June 2018, Association for Computational Linguistics, pp. 2227–2237.
[22] D. Preot¸iuc-Pietro, Y. Liu, D. Hopkins, and L. Ungar, Beyond binary labels: political ideology prediction of twitter users, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 729–740.
[23] D. Preot¸iuc-Pietro and L. Ungar, User-level race and ethnicity predictors from twitter text, in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1534–1545.
[24] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, ArXiv, abs/1910.01108 (2019).
[25] K. Sylwester and M. Purver, Twitter language use reflects psychological differences between democrats and republicans, PloS one, 10 (2015), p. e0137422.
[26] E. Tavan, A. Rahmati, and M. A. Keyvanrad, Persian emoji prediction using deep learning and emoji embedding, in 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE, 2020, pp. 350–355.
[27] Z. Terechshenko, F. Linder, V. Padmakumar, M. Liu, J. Nagler, J. A. Tucker, and R. Bonneau, A comparison of methods in political science text classification: Transfer learning language models for politics, Available at SSRN, (2020).
[28] S. Volkova, G. Coppersmith, and B. Van Durme, Inferring user political preferences from streaming communications, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, pp. 186–196.
[29] M. Voong, K. Gunda, and S. S. Gokhale, Predicting the political polarity of tweets using supervised machine learning, in 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), IEEE, 2020, pp. 1707–1712.
[30] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, Xlnet: Generalized autoregressive pretraining for language understanding, Advances in neural information processing systems, 32 (2019).