Skip to main content

Information retrieval and text mining DAT640

The course offers an introduction to techniques and methods for processing, mining, and searching in massive text collections. The course considers a broad variety of applications and provides an opportunity for hands-on experimentation with state-of-the-art algorithms using existing software tools and data collections.


Course description for study year 2021-2022

Facts
Course code

DAT640

Version

1

Credits (ECTS)

10

Semester tution start

Autumn

Number of semesters

1

Exam semester

Autumn

Language of instruction

English

Learning outcome

Knowledge:

  • Theory and practice of concepts, methods, and techniques for managing and analyzing large amounts of text data.

Skills:

  • Process and prepare large-scale textual data collections for retrieval and mining.
  • Apply retrieval, classification, and clustering methods to a range of information access problems.
  • Conduct performance evaluation and error analysis.

General competencies:

  • Understanding of the strengths and limitations of modern information retrieval and text mining techniques. Being able to identify promising business applications, participate in and lead such projects.
Content
  • Search engine architecture
  • Text preprocessing and indexing
  • Retrieval models (vector-space model, probabilistic models, learning to rank, neural models)
  • Search engine evaluation
  • Query modeling, relevance feedback
  • Web search (crawling, indexing, link analysis)
  • Semantic search (knowledge bases, entity retrieval, entity linking)
  • Text clustering
  • Text categorization
Required prerequisite knowledge
None
Exam

Project work and written exam

Form of assessment Weight Duration Marks Aid
Project work 2/5 Letter grades
Written exam 3/5 4 Hours Letter grades

The project is a combination of individual and group assignments. The project groups are set up by the course instructor. There is no re-sit option on the project. If a student fails the project, they have to take this part next time the subject is lectured.All assessment parts must be passed in order to achieve an overall grade in the course.Permitted aid at the exam: all written and printed material, and basic calculator

Course teacher(s)
Course coordinator: Krisztian Balog
Coordinator laboratory exercises: Ivica Kostric
Head of Department: Tom Ryen
Method of work
6 hours of lectures/lab exercises each week.
Open for
Admission to Single Courses at the Faculty of Science and Technology Computer Science - Master`s Degree Programme Industrial Automation and Signal Processing - Master's Degree Programme - 5 year Exchange programme at Faculty of Science and Technology
Course assessment
Form and/or discussion.
Overlapping courses
Course Reduction (SP)
Web Search and Data Mining (DAT630) 5
Literature
The syllabus can be found in Leganto