COMPUTATIONAL LINGUISTICS MOD. 2
- Academic year
- 2019/2020 Syllabus of previous years
- Official course title
- COMPUTATIONAL LINGUISTICS MOD. 2
- Course code
- LMJ070 (AF:314082 AR:167829)
- Modality
- On campus classes
- ECTS credits
- 6 out of 12 of COMPUTATIONAL LINGUISTICS
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- L-LIN/01
- Period
- 2nd Semester
- Course year
- 2
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
The main goals of this course are:
- to provide students with the basic technical tools for the computational analysis of textual data
- to introduce the student to the Python programming language
- to strengthen the student's ability to reflect on the properties of language
- to stimulate critical thinking and the ability to think out of the box
Expected learning outcomes
- familiarity with the Python programming language and with the NLTK package
- ability to design and implement simple algorithms
- familiarity with the main distributional semantics approaches
- learning of the basic techniques for the extraction of linguistic knowledge from corpora
- knowledge of the principal levels of linguistic annotation
2. Applying knowledge and understanding
- knowledge of the features and limitations of the most common computational linguistics tools and approaches, so as to be able to pick the most appropriate solution for a given linguistic research issue
- use of Python for the implementation of scripts for the quantitative and computational analysis of text
- ability to advance and test original and sounded hypotheses (relevant for the non-attending students only)
3. Making judgements
- ability to implement self-development strategies to improve technical skills
- awareness of the technical and deontological issues connected to the automatic treatment of language
- ability to retrieve the most relevant literature and to use it critically (relevant for the non-attending students only)
- ability to select a suitable theoretical framework to answer a research question of interest (relevant for the non-attending students only)
- ability to compare competing hypotheses (relevant for the non-attending students only)
4. Communication skills
- ability to write a report to describe the process, progress and result of an original scientific research (relevant for the non-attending students only)
- ability to interact with researchers with a different scientific background (among which, computational linguists and cognitive scientists)
- ability to interact with the other students and the professor
5. Learning skills
- ability to learn novel scripting languages (among which, R, PERL, Matlab, Javascript...)
- ability to acquire technical knowledge pertaining to issues only indirectly linked to the automatic treatment of language (e.g. the statistical analysis, the creation of web pages, the management of a database)
- ability to learn novel technical tools for the automatic treatment of language (e.g. annotation tools, corpora management and query tools)
Pre-requirements
Basic mathematics skills
Basic familiarity with computers, but no special experience with programming or software is expected
Contents
2. Python programming basics / Strings
3. Functions
4. Lists
5. Working with Files
6. Dictionaries, sets and more
7. Regular expressions
8. Writing structured programs
9. Modules and packages/Searching Text With Python
10. Recap: Python programming basics
11. Corpora and Their Annotation / Working with Tagged Corpora
12. Text (pre-)processing using NLTK
13. Measuring the Association Between Words
14. Vector Semantics
15. Recap: Introduction to Natural Language Processing with Python
Referral texts
MANDATORY READINGS:
- S. Bird, E. Klein and E. Loper (2016) Natural Language Processing with Python: Analyzing
Text with the Natural Language Toolkit, Updated 1st edition, O’Reilly (ch. 2.1, 2.2, 3.2, 3.4, 3.5, 4.2-4.8, 5.1-5.4, 8). Available online at: https://www.nltk.org/book/
- A. B. Downey (2015) Think Python: How to Think Like a Computer Scientist, 2nd edition,
O’Reilly (ch. 1, 2, 3, 5, 10, 11.1-11.5, 12.1-12.3, 14.1-14.4). Available online at: https://www.greenteapress.com/thinkpython/thinkpython.html
- D. Jurafsky and J. H. Martin (2008/2019) Speech and Language Processing, 2nd or 3rd edition (ch. 2.1). The draft version of the relevant chapter from the 3rd edition is available online at: https://web.stanford.edu/~jurafsky/slp3/2.pdf
(SUGGESTED) SUPPLEMENTARY READINGS:
- S. Evert (2009) Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus
linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 1212-1248 (sections 1-4). The extended version is available online at: http://www.stefan-evert.de/PUB/Evert2007HSK_extended_manuscript.pdf
- A. Lenci (2018) Distributional Models of Word Meaning, Annual Review of Linguistics, 4: 151-171. Available online at: http://colinglab.humnet.unipi.it/wp-content/uploads/2012/12/annurev-linguistics-030514-125254.pdf
Assessment methods
ATTENDING STUDENTS
Students attending at least 70% of the classes qualify as "attending" students. Their learning is assessed through three sets of exercises, each one of which will be assigned every 4/5 weeks. Each assignment should be submitted electronically by due date.
The final grade will be calculated as follows:
- first assignment: 25% of the final grade
- second assignment: 35% of the final grade
- third assignment: 30% of the final grade
- in class participation: on laboratory sessions, short programming exercises will be given as homework and briefly discussed during the following lab. Prior the beginning of each lab session, all the students are required to submit the exercises assigned in the previous session. Students that didn't try to solve at least 50% of the exercises will be penalized at the rate of 2% of the maximum final grade for each notebook that is either insufficient or that hasn't been submitted on time (up to a maximum of 10% of the maximum final grade).
NON-ATTENDING STUDENTS
Non-attending students are required to carry out a programming project that should be described in details in a written report and discussed face to face with the instructor during the oral exam. The aim of the project is to build an automatically annotated corpus and to use Python to extract the linguistic information that is needed to perform an innovative quantitative linguistic analysis. Note that the specific topic of the project should have been agreed upon with the instructor. The final report must be submitted electronically at least one week prior to the exam.
The project will be graded as follows:
- quality of the code: 40% of the final grade
- knowledge of the relevant literature and of the state-of-the-art: 30% of the final grade
- quality of the report: 20% of the final grade
- one‐on‐one discussion with the instructor: 10% of the final grade
Teaching methods
- discussion of some programming exercises from the past homework
- overview of the session key concepts and principles
- work on the programming exercises in the relevant Jupyter notebook available on the university e-learning platform