Språkforum: Deep Humanites

Mike Kestemont fra universitetet i Antwerpen kommer til Språkforum med følgende foredrag: "Deep Humanites: representation learning, word embeddings and their application in computational text analysis".

Bidragsyter denne gangen på Språkforum er Mike Kestemont fra universitetet i Antwerpen. Kestemont forsker innen Computational Linguistics og Digital Humanities, og han har tidligere undervist i digital tekstanalyse, dataprogrammering for Humaniora og middelalderfilologi. Hans forskning omfatter blant annet forfatterskapsattribusjon ved hjelp av stylometri, samt konseptet Deep Learning, som også er temaet for foredraget.

Deep Humanites: representation learning, word embeddings and their application in computational text analysis

'Deep Learning' is an increasingly popular branch of Artificial Intelligence. In this field, computers are trained to automatically perform complex tasks, such as face recognition in photographs on social media. To this end, Deep Learning uses an architecture called 'neural networks', which are loosely inspired by the working of the human brain.

Deep Learning is also known as 'representation learning' because neural networks excel at learning how to best represent data. Representation learning has recently led to important breakthroughs in a number of fields, such as computer vision, speech recognition or audio analysis. In this talk, I will briefly survey the state of the art in the field of Deep Learning, and discuss its application to research in the Humanities, with an emphasis on text analysis.

In the Humanities, broadly defined as the study of the products of the human mind, increasing attention is being paid to computational methodologies. 'Distant Reading', for instance, refers to the application of methods for computational text analysis to large bodies of (literary) texts. So far, Deep Learning has only been sparingly applied in Humanities research, although its potential for the advancement of the field is vast.

In this talk, I will focus upon the concept of 'word embeddings', which refer to a highly successful application of Deep Learning in computational linguistics. Word embeddings are numeric representations of words (vectors), which capture both semantic and syntactic qualities of words. Such vectors can be used to model semantic relationships ('dogs' are more like 'cats', than like 'buildings') or even to solve complex analogies via plain vector arithmetics ('Russia'+'river' = 'Wolga'; 'king'-'man'+'woman' = 'queen'). However powerful, such word representations can be cheaply obtained by applying neural networks to unstructured corpora, without the need for any manual annotation. In order to illustrate the potential of word embeddings for textual analysis in the Humanities, I will present a number of recent applications of word embeddings in the field of literary and linguistic studies.