Information Retrieval and Text Mining (DAT640)
The course offers an introduction to techniques and methods for processing, mining, and searching in massive text collections. The course considers a broad variety of applications and provides an opportunity for hands-on experimentation with state-of-the-art algorithms using existing software tools and data collections.
Course description for study year 2023-2024. Please note that changes may occur.
Semester tution start
Number of semesters
Language of instruction
- Search engine architecture
- Text preprocessing, indexing, representation learning
- Retrieval models (vector-space model, probabilistic models, learning to rank, neural models)
- Search engine evaluation
- Query modeling, relevance feedback
- Web search (crawling, indexing, link analysis)
- Semantic search (knowledge bases, entity retrieval, entity linking)
- Text categorization and clustering
- Theory and practice of concepts, methods, and techniques for managing and analyzing large amounts of text data.
- Process and prepare large-scale textual data collections for retrieval and mining.
- Apply retrieval, classification, and clustering methods to a range of information access problems.
- Conduct performance evaluation and error analysis.
- Understanding of the strengths and limitations of modern information retrieval and text mining techniques. Being able to identify promising business applications, participate in and lead such projects.
Required prerequisite knowledge
Project work and written exam
|Form of assessment||Weight||Duration||Marks||Aid|
|Project work||2/5||Letter grades|
|Written exam||3/5||4 Hours||Letter grades||All aids are permitted - it is not permitted to collaborate / get help from other people in working with the exam task|
The project is a combination of individual and group assignments. The project groups are set up by the course instructor. There is no re-sit option on the project. If a student fails the project, they have to take this part next time the subject is lectured.All assessment parts must be passed in order to achieve an overall grade in the course.
Course coordinator:Krisztian Balog
Course teacher:Krisztian Balog
Head of Department:Tom Ryen
Method of work
|Web Search and Data Mining (DAT630_1)||5|