More about the thinking behind the LiTra project

We know remarkably little about the early history that formed the known European languages and cultures. While advances in genetics have led to new debates about population histories, they cannot tell us about the languages spoken. In the absence of written records, language appears ephemeral: changes through time seemingly erase all traces of earlier speech, leaving only abstract ‘proto-languages’ to be reconstructed.
Such erasures are not, however, complete, as language change always leaves behind a tail of residual forms. Very little research has dealt with the low-frequency variation that forms part of all natural language, even though it is known that residual forms are often resistant to change and may show stable geographical patterns. This project explores the potential of such patterns to provide linguistic ‘fingerprints’ allowing the reconstruction of much earlier linguistic configurations. It also addresses the general question of low-frequency variation as a carrier of meaning: how can the systematic study of minor variants refine current views of linguistic variation and change?
We will study the spread and interactions of linguistic and cultural groups in early England − Celtic, Scandinavian and West Germanic − through geographically coherent patterns of minority variants (‘micro-patterns’) in a set of historical, purpose-built text corpora. Such an approach has not been attempted before, and is made possible by the combination of philological expertise and the development of corpus annotation methods based on deep learning technology. The project breaks new ground in the study of linguistic variation, where small-scale patterning has largely been ignored. Linguistic traces, combined with the findings of archaeology and genetics, are expected to form a powerful means of reconstructing the past, throwing light on past linguistic areas and interactions as well as on the maintenance of local and regional identities.
The LiTra project expands and develops A Corpus of Middle English Local Documents, which forms the main material for the study. The corpus is uniquely suitable for this approach, as the texts included are precisely localized and written in a highly variable language which reflects geographical variation. To make efficient searches possible, the corpus is annotated using deep learning technology, a process until very recently not practicable for the extremely variable Middle English texts.

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.