INFORMATION RETRIEVAL AND WEB SEARCH
- Academic year
- 2024/2025 Syllabus of previous years
- Official course title
- INFORMATION RETRIEVAL AND WEB SEARCH
- Course code
- CM0473 (AF:513731 AR:286762)
- Modality
- On campus classes
- ECTS credits
- 6
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- INF/01
- Period
- 2nd Semester
- Course year
- 1
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
The field of Information Retrieval (IR) has changed considerably in the last years with the expansion of the Web (World Wide Web), the born of Web Search Engines, and the advent of data and distributed computing clouds.
During the last decade, relentless optimization of information retrieval efficiency and effectiveness has driven web search engines to new quality levels. The field of IR has thus moved from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access. The course aims at presenting the scientific underpinnings of this field and some practical issue.
In addition, we present techniques and algorithms that fall into the fields of machine learning applied to problems of text mining and ordering of search engine results, and of Web network analysis. Recent developments related to generative AI and Large Language Models (LLM) will be addressed here, up to their application for modern Neural IR, where LLM neural models are used for ranking and retrieval.
Expected learning outcomes
- Knowledge and understanding of the retrieval models, and the methods and indexes for processing queries
- Knowledge and understanding of the components of a search engine, and the techniques and algorithms to get the right compromise between efficiency and effectiveness of the retrieval
- Knowledge and understanding of the methods of analysis of networks, including the Web
- Knowledge of of environments and libraries for large-scale software development, capable of handling and processing large volumes of data
- Knowledge of programming environments and algorithms for Artificial Intelligence
- Knowledge and understanding of the methods of Machine Learning to classify and group texts, and to sort the retrieval results
- Knowledge of the potential ethical, social and legal implications of secure information processing
Applying knowledge and understanding:
- Ability to implement algorithms to index and compress texts and process queries
- Ability to choose and evaluate machine learning methods to classify and cluster text corpora, and to sort the retrieval results
- Ability to identify tools for network analysis, including the Web
- Ability to use advanced programming techniques in the areas of high-performance computing, and algorithms to handle high data volumes
- Ability to verify functional and non-functional requirements of a computer system based on machine learning
- Ability to study scientific literature to identify potential solutions to problems with innovative state-of-the-art methods
Pre-requirements
Machine Learning knowledge and skills
Contents
Text vectorial representation
Basic tokenizing
Indexing, and Implementation of Vector-Space Retrieval
Evaluation of IR Systems
Neural IR
Web Search: Crawling, Link-based algorithms
Scalability issues of IR systems
Referral texts
- Nicola Tonellotto. Neural IR. 2022: https://arxiv.org/pdf/2207.13443.pdf
- Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. Pretrained Transformers for Text Ranking: BERT and Beyond. 2021: https://arxiv.org/pdf/2010.06467.pdf
- Lecture notes and scientific papers.
Assessment methods
The second part of the exam regards the critical reading and the public presentation of scientific articles on the course topics, and aims to evaluate the analytical capability of the candidate, in addition to the ability of summarizing and communication skills. The second part of the exam can also be taken by developing a software project whose written report will be discussed orally with the instructor.
The exam is divided into two parts. The first part is written, and contributes 60% of the final grade. It aims to test knowledge and the ability to apply and evaluate solutions in application contexts of modern information retrieval by means of open questions. The assessment of the first part of the examination is formulated according to this scheme: (1) knowledge and ability to apply knowledge in the answers given (range 40%), (2) detail and completeness of answers (range 40%), (3) communication skills (range 20%).
The second part of the examination, which contributes to 40% of your final grade, concerns the critical reading and public presentation of scientific articles on course topics. It aims to assess analytical ability and the degree of understanding of the text (range 60%), as well as synthesis and communication skills (range 40%).
The second part of the examination may also be taken by developing a software project whose written report will be discussed orally. In this case, the project will be assessed according to the following scheme: analytical ability of the candidate in tackling the project (range 20%), efficiency of the software project (50 %), completeness of the report and the experimental analysis, as well as communication skills (range 30%).