DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING

Academic year
2024/2025 Syllabus of previous years
Official course title
DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING
Course code
CM0624 (AF:451273 AR:286742)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
INF/01
Period
1st Semester
Course year
2
Where
VENEZIA
Moodle
Go to Moodle page
The course is framed within the Computer Science and Information Technology curriculum and focuses on the latest techniques in automatic natural language analysis. The development of the course is gradual and follows the various levels of language analysis ranging from morphology and syntax to semantics and pragmatics. Each of these levels, in fact, is used to solve specific problems in the field of Natural Language Processing (NLP) such as syntactic parsing, word embedding, semantic parsing, question answering, and the use of generative language models to develop chatbots such as ChatGPT.

The approaches presented are based on neural architectures but space will also be left for important alternative approaches to contextualize the state of the art in the discipline.

The training objective is to provide a broad knowledge of modern techniques of natural language analysis and to indicate the fields in which it is applied.
At the end of the course, the student will be able to:
- Use and know the fundamental algorithms for natural language analysis
- Implement and train models for automatic text analysis
- Choose the most suitable models for specific applications
Basic knowledge of linear algebra and statistics are recommended. Knowledge of Python is also required for practical work. Pytorch and Transformers libraries are a plus.
Introduction
- The NLP pipeline
- Morphology
- Syntax
- Semantics
- Pragmatics
- Tokenization
- Lemmatization and stemming 
- Word-based analysis
- Sentence-based analysis

NLP Tasks
NLP Benchmarks

Embedding Models
- Word Embedding
- Sentence embedding
- Sense embedding
- Entity embedding

Deep Learning for Sequences
- Recurrent networks and language models
- Backprop through time
- LSTM
- GRU

Attention Mechanisms
- Self-Attention
- Transformers

(Large) Language Models:
- Encoder models
- Decoder Models
- Encoder-Decoder models
- Masked Language Modeling
- Autoregressive Models

NLP Tasks
NLP Benchmarks

Applications
- Text classification (sentiment analysis, language classification, intent classification)
- Named Entity Recognition
- Machine Translation: seq2seq
- Question Answering
- Text Summarization
- Topic Modeling
All study materials will be provided through Moodle.
Learning assessment involves the development of a project (individual or group, at the student's discretion) in Python aimed at putting into practice the knowledge acquired during the course and addressing a specific NLP problem. The evaluation will be based on three main aspects:

1. Design ability: The project should reflect a clear understanding of the theoretical concepts and methodologies learned. It will be important to demonstrate a structured plan and a critical approach in carrying out the work.
2. Work organization: The ability to manage the various phases of the project, from ideation to implementation, will be evaluated. This includes time management, task division, and collaboration (if applicable).
3. Mastery of tools: During the presentation, the student must demonstrate full mastery of the tools and technologies used and a thorough knowledge of the concepts introduced during the course.

The evaluation criteria are as follows:

A. Scores in the 18-22 range will be awarded in the presence of:
- Sufficient knowledge and ability to structure the project;
- Limited ability to justify implementation choices;
- Sufficient communication skills, especially in relation to the use of course-specific language.

B. Scores in the 23-26 range will be awarded in the presence of:
- Fair knowledge and ability to structure the project;
- Fair ability to collect and/or interpret data, proposing effective implementation solutions;
- Fair communication skills, especially in relation to the use of course-specific language.

C. Scores in the 27-30 range will be awarded in the presence of:
- Good or excellent knowledge and ability to structure the project;
- Good or excellent ability to collect and/or interpret data, proposing innovative implementation solutions;
- Fully appropriate communication skills, especially in relation to the use of course-specific language.

D. Lode will be awarded in the presence of excellent knowledge and applied understanding of the program, judgment skills, and communication abilities.
The course consists of lectures and practical classroom activities to consolidate the concepts learned. As study material, slides and scientific articles will be provided.
English
oral

This subject deals with topics related to the macro-area "Climate change and energy" and contributes to the achievement of one or more goals of U. N. Agenda for Sustainable Development

Definitive programme.
Last update of the programme: 11/10/2024