Course

Information Retrieval and Text Mining (DAT640)

Facts

Course code DAT640

Credits (ECTS) 10

Semester tution start Autumn

Language of instruction English

Number of semesters 1

Exam semester Autumn

Time table View course schedule

Literature The syllabus can be found in Leganto

Introduction

The course offers an introduction to techniques and methods for processing, mining, and searching in massive text collections. The course considers a broad variety of applications and provides an opportunity for hands-on experimentation with state-of-the-art algorithms using existing software tools and data collections.

Content

NB! This is an elective course and may be cancelled if fewer than 10 students are enrolled by August 20th.

  • Text preprocessing, indexing
  • Representation learning (word embeddings)
  • Text categorization
  • Search engine architecture
  • Retrieval models (vector-space model, probabilistic models, learning to rank, neural models)
  • Search engine evaluation
  • Query modeling, relevance feedback
  • Web search (link analysis)
  • Semantic search (knowledge bases, entity retrieval, entity linking)
  • Conversational information access
  • Transformers and large language models

Learning outcome

Knowledge:

  • Theory and practice of concepts, methods, and techniques for managing and analyzing large amounts of text data.

Skills:

  • Process and prepare large-scale textual data collections for retrieval and mining.
  • Apply retrieval, classification, and clustering methods to a range of information access problems.
  • Conduct performance evaluation and error analysis.

General competencies:

  • Understanding of the strengths and limitations of modern information retrieval and text mining techniques. Being able to identify promising business applications, participate in and lead such projects.

Required prerequisite knowledge

None

Exam

Project work and written exam

Weight 1/1

Marks Letter grades

Withdrawal deadline 19.11.2025

The project is a combination of individual and group assignments. The project groups are set up by the course instructor.

There is no re-sit option on the project. If a student fails the project, they have to re-take this part next time the course is lectured.

Digital written exam.

Both assessment parts must be passed in order to achieve an overall grade in the course.

Method of work

6 hours of lectures/lab exercises each week.

Overlapping courses

Course Reduction (SP)
Web Search and Data Mining (DAT630_1) , Information Retrieval and Text Mining (DAT640_1) 5

Open for

Data Science - Master Data Science - Master (Part-Time) Computer Science - Master Computer Science - Master (Part-Time)
Exchange programme at The Faculty of Science and Technology

Admission requirements

Must meet the admission requirements of one of the study programmes the course is open for.

Course assessment

The faculty decides whether early dialogue will be held in all courses or in selected groups of courses. The aim is to collect student feedback for improvements during the semester. In addition, a digital course evaluation must be conducted at least every three years to gather students’ experiences.
The course description is retrieved from FS (Felles studentsystem). Version 1