March of data

The March of Data: English Linguistics across Disciplinary Borders

Organizers: Jukka Tyrkkö (Linnaeus University), Steven Coats (University of Oulu), Veronika Laippala (University of Turku)

New approaches in English linguistics have emerged in recent years that blur disciplinary boundaries, facilitated by factors such as increased access to large data sets (e.g. from social media or digitization projects), the application of new methods of data analysis and visualization, the sharing of code and data on platforms such as CLARIN or GitHub, as well as by continual advances in technologies related to data storage, retrieval, and processing.
In this workshop we invite papers that focus on new developments and cross-disciplinary approaches in the landscape of contemporary English linguistics. Potential topics and thematic fields may include:

  1. Computer-mediated communication (CMC): An ever-increasing proportion of human interaction is mediated by digital technologies. We invite papers that focus on CMC data, especially in the context of computational sociolinguistics (Nguyen et al. 2016), such as forums, blogs, newsgroups, SMS and WhatsApp messages, text chats, wiki discourse, social media platforms such as Twitter, YouTube, or LinkedIn, pseudo-anonymous “chans” such as 4chan, or streaming platforms such as Twitch or webcam sites (see, e.g., Coats 2017, Gonçalves et al. 2018, Hiippala et al. 2019)
  2. Digital humanities: Language data give us insight into processes and developments in specifically linguistic domains such as (e.g.) lexis, grammar, or syntax, but can also shed light on language- mediated aspects of human experience such as culture, history, politics, and economic behavior (see, e.g., Alexander 2016). We welcome papers in which English language data are utilized in order to investigate questions in the Digital Humanities.
  3. Big data, maps, and visualization: Researchers in English corpus linguistics now often work with large data sets that are annotated with geographic or other metadata, allowing the creation of language maps (see, e.g., Grieve et al 2017), as well as many other types of data visualizations (see, e.g., Hilpert 2011). We invite papers that report on the collection and visualization of English language data, and particularly papers that discuss interactive visualization and mapping tools, as well as new innovative methods such as Virtual Reality and Augmented Reality technologies (see, e.g., Alissandrakis et al. forthcoming).
  4. Machine learning: How can machine learning and AI be used for the study of English, for example in genre or register classification, or in the preparation and annotation of multimodal language data? What are the current best practices (see, e.g., Gries 2019) and where are we heading? We encourage the submission of papers that utilize machine learning or neural network approaches for the identification, analyses and classification of language data as text or as sound, image and video.

The conveners plan to edit a publication based on the papers presented at the workshop.

For references, please see here.


Click here for the book of abstracts.


The workshop presentations will be made available for viewing prior to the conference. On Wednesday, 2 June, the workshop participants will have a panel discussion of the papers, at 10:00-11:30am (UTC+3). The workshop organizers welcome everyone to watch the videos and to attend the panel discussion!

Steven Coats Multiple modals in the wild: A study of 24,530 multiple modal sequences in naturalistic North American speech
Veronika Laippala, Jesse Egbert, Douglas Biber & Aki-Juhani Kyröläinen Using machine learning to predict keywords
Gerold Schneider The visualisation and evaluation of semantic and conceptual maps
Masoud Fatemi & Mikko Laitinen Size matters: An algorithm-based approach to social networks
Severi Luoto Sexual dimorphism in language, and the gender shift hypothesis of homosexuality
Jukka Tyrkkö Exploring the potential uses of sentiment analysis in historical linguistics