INFORMATION RETRIEVAL AND WEB SEARCH

Academic year: 2024/2025 Syllabus of previous years

Official course title: INFORMATION RETRIEVAL AND WEB SEARCH

Course code: CM0473 (AF:513731 AR:286762)

Teaching language: English

Modality: On campus classes

ECTS credits: 6

Degree level: Master's Degree Programme (DM270)

Academic Discipline: INF/01

Period: 2nd Semester

Course year: 1

Where: VENEZIA

Moodle: Go to Moodle page

Contribution of the course to the overall degree programme goals

The course is compulsory within the curriculum Artificial Intelligence and Data Engineering (AIDE), and introduces the student to the topics concerning Information Retrieval and Web Search.
The field of Information Retrieval (IR) has changed considerably in the last years with the expansion of the Web (World Wide Web), the born of Web Search Engines, and the advent of data and distributed computing clouds.
During the last decade, relentless optimization of information retrieval efficiency and effectiveness has driven web search engines to new quality levels. The field of IR has thus moved from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access. The course aims at presenting the scientific underpinnings of this field and some practical issue.
In addition, we present techniques and algorithms that fall into the fields of machine learning applied to problems of text mining and ordering of search engine results, and of Web network analysis. Recent developments related to generative AI and Large Language Models (LLM) will be addressed here, up to their application for modern Neural IR, where LLM neural models are used for ranking and retrieval.

Expected learning outcomes

Knowledge and understanding:

- Knowledge and understanding of the retrieval models, and the methods and indexes for processing queries
- Knowledge and understanding of the components of a search engine, and the techniques and algorithms to get the right compromise between efficiency and effectiveness of the retrieval
- Knowledge and understanding of the methods of analysis of networks, including the Web
- Knowledge of of environments and libraries for large-scale software development, capable of handling and processing large volumes of data
- Knowledge of programming environments and algorithms for Artificial Intelligence
- Knowledge and understanding of the methods of Machine Learning to classify and group texts, and to sort the retrieval results
- Knowledge of the potential ethical, social and legal implications of secure information processing

Applying knowledge and understanding:

- Ability to implement algorithms to index and compress texts and process queries
- Ability to choose and evaluate machine learning methods to classify and cluster text corpora, and to sort the retrieval results
- Ability to identify tools for network analysis, including the Web
- Ability to use advanced programming techniques in the areas of high-performance computing, and algorithms to handle high data volumes
- Ability to verify functional and non-functional requirements of a computer system based on machine learning
- Ability to study scientific literature to identify potential solutions to problems with innovative state-of-the-art methods

Pre-requirements

Data structures and algorithms, basic in linear algebra and in probability theory.
Machine Learning knowledge and skills

Basic IR Models
Text vectorial representation
Basic tokenizing
Indexing, and Implementation of Vector-Space Retrieval
Evaluation of IR Systems
Neural IR
Web Search: Crawling, Link-based algorithms
Scalability issues of IR systems

Referral texts

- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008: https://nlp.stanford.edu/IR-book/
- Nicola Tonellotto. Neural IR. 2022: https://arxiv.org/pdf/2207.13443.pdf
- Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. Pretrained Transformers for Text Ranking: BERT and Beyond. 2021: https://arxiv.org/pdf/2010.06467.pdf
- Lecture notes and scientific papers.

Assessment methods

The exam is divided into two parts. The first part is written, and aims to test with open questions the knowledge and the ability to apply and evaluate solutions in modern application contexts of information retrieval (range 60%).

The second part of the exam regards the critical reading and the public presentation of scientific articles on the course topics, and aims to evaluate the analytical capability of the candidate, in addition to the ability of summarizing and communication skills. The second part of the exam can also be taken by developing a software project whose written report will be discussed orally with the instructor.

The exam is divided into two parts. The first part is written, and contributes 60% of the final grade. It aims to test knowledge and the ability to apply and evaluate solutions in application contexts of modern information retrieval by means of open questions. The assessment of the first part of the examination is formulated according to this scheme: (1) knowledge and ability to apply knowledge in the answers given (range 40%), (2) detail and completeness of answers (range 40%), (3) communication skills (range 20%).

The second part of the examination, which contributes to 40% of your final grade, concerns the critical reading and public presentation of scientific articles on course topics. It aims to assess analytical ability and the degree of understanding of the text (range 60%), as well as synthesis and communication skills (range 40%).
The second part of the examination may also be taken by developing a software project whose written report will be discussed orally. In this case, the project will be assessed according to the following scheme: analytical ability of the candidate in tackling the project (range 20%), efficiency of the software project (50 %), completeness of the report and the experimental analysis, as well as communication skills (range 30%).

Type of exam

written and oral

Teaching methods

Theoretical and practical lectures.

Definitive programme.

Last update of the programme: 31/01/2025

Type	Name	Sender (Domain)	Description	Duration	Policy
Essential	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	They maintain the session data of the SingleSignOn.	session	Information by Ca' Foscari University
Essential	PHPSESSID	Unive.it (www.unive.it)	Unique user identifier for the website applications.	session	Information by Ca' Foscari University
Essential	cookie[*]	Unive.it (www.unive.it)	It stores the user's preferences on cookies. user preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	cookie	idp.unive.it	It stores the user's preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	fe_typo_user	Unive.it (www.unive.it)	Unique user identifier for the reserved area of the website	session	Information by Ca' Foscari University
Essential	JSESSIONID	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	ADMCMD_prev	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	unive.it	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	noiframe	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	_pk_id[*]	unive/WAI	*	30 days	Information by Matomo
Essential	_pk_ses[*]	unive/WAI	*	1 day	Information by Matomo
Essential	_pk_ref[*]	unive/WAI	*	6 months	Information by Matomo
Essential	_gsas[*]	unive/google	It stores the user's preferences on cookies.	3 months	Information by Google
Essential	_opensaml_req_cookie%[*]	unive	Authentication and SingleSignOn (shibboleth)	session	Information by Ca' Foscari University
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Google - Youtube	CONSENT	Google (google.com)	Used by Google to store the user's preferences.	17 years	Information by Google
Google - Youtube	__Secure-1PSID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Essential	Socialpix	Unive.it (www.unive.it)	They are used to record cookie preferences	6 months	Information by Ca' Foscari University
Facebook - Pixel	_fbp	Unive.it (www.unive.it)	Tracks users for retargeting advertising on Facebook	3 months	Information by Facebook
Facebook - Pixel	datr	Facebook	Marketing	2 anni	Information by Facebook