INFORMATION RETRIEVAL AND WEB SEARCH

Academic year
2019/2020 Syllabus of previous years
Official course title
INFORMATION RETRIEVAL AND WEB SEARCH
Course code
CM0473 (AF:306544 AR:166112)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
INF/01
Period
2nd Semester
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The course is compulsory within the curriculum "Data Management and Analytics", and introduces the student to the topics concerning Information Retrieval and Web Search.
The field of Information Retrieval (IR) has changed considerably in the last years with the expansion of the Web (World Wide Web), the born of Web Search Engines, and the advent of data and distributed computing clouds.
During the last decade, relentless optimization of information retrieval efficiency and effectiveness has driven web search engines to new quality levels. The field of IR has thus moved from being a primarily academic discipline to being the basis underlying most people’s preferred means of information access. The course aims at presenting the scientific underpinnings of this field and some practical issue.
In addition, we present techniques and algorithms that fall into the fields of machine learning applied to problems of text mining and ordering of search engine results, and of Web network analysis.
Knowledge and understanding:

- Knowing and understanding the retrieval models, and the methods and indexes for processing queries
- Knowing and understanding the components of a search engine, and the techniques and algorithms to get the right compromise between efficiency and effectiveness of the retrieval
- Knowing and understanding the methods of machine learning to classify and group texts, and to sort the retrieval results
- Knowing and understanding the methods of analysis of networks, including the Web

Applying knowledge and understanding:

- Ability to implement algorithms to index and compress texts and process queries
- Ability to choose and evaluate machine learning methods to classify and cluster text corpora, and to sort the retrieval results
- Ability to identify tools for network analysis, including the Web
Data structures and algorithms, basic in linear algebra and in probability theory.
Basic IR Models
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Evaluating IR Systems
Text Representation
Web Search: Crawling, Link-based algorithms and Scalability issues
Web and text mining
Information Extraction and Integration
Lecture notes.
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008.
The exam is divided into two parts. The first part is written, and aims to test with open questions the knowledge and the ability to apply and evaluate solutions in modern application contexts of information retrieval. The second part of the exam regards the critical reading and the public presentation of scientific articles on the course topics, and aims to evaluate the analytical capability of the candidate, in addition to the ability of summarizing and communication skills. The second part of the exam can also be taken by developing a software project whose written report will be discussed orally with the instructor.
Theoretical and practical lectures.
English
written and oral
Definitive programme.
Last update of the programme: 14/04/2019