Today language models exist for a wide variety of languages but comparing output across models can be difficult. One reason for this is the models' reliance on training data, which may stem from different sources and from different genres of language (news articles, interview transcripts, social media). Another reason is the fact that specific models for specific languages may have been developed completely independent from other models, leading to differences in implementation. This poses challenges for multilingual comparison of discourse and often leads to analysis restricting data to predominantly English language content, despite the object and topic of study being of worldwide relevance
Using social media discussions surrounding the Syrian refugee crisis in Europe as a case study, the aims of this project is to assess and evaluate the comparability of results from available computational linguistic methods. We analyse social media data from different platforms in nine different languages using state of the art natural language processing technologies and language resources in order to address two of the largest challenges of computational text analysis: longitudinal change and multilingual comparison.
Anamaria Dutceac Segesten, European Studies, Center for Languages and Literature, Lund University
Johannes Bjerva, Department of Computer Science, Aalborg University
Kristian Gade Kjelmann, Department of Sociology and Social Work, Aalborg University
Yiyi Chen, Department of Computer Science, Aalborg University