COMPUTATIONAL LINGUISTICS
- Academic year
- 2023/2024 Syllabus of previous years
- Official course title
- COMPUTATIONAL LINGUISTICS
- Course code
- LM5860 (AF:459905 AR:250844)
- Modality
- On campus classes
- ECTS credits
- 6
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- L-LIN/01
- Period
- 2nd Semester
- Course year
- 1
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
The main goals of this course are:
- to provide students with the basic methodological tools to perform linguistic annotation and quantitative analysis of textual data
- to introduce the student to the relevant scientific literature
- to strengthen the student's ability to reflect on the properties of language
- to encourage the student to combine insights and approaches belonging to relatively independent research fields such as theoretical linguistics, computational linguistics and cognitive psychology
- to stimulate critical thinking and the ability to think out of the box
- to practice scientific writing
Expected learning outcomes
- familiarity with the basic methods for text processing
- familiarity with the basic terminology and understanding of the relevant scientific literature
- knowledge of the mathematical foundations of Natural Language Processing
- familiarity with the most commonly used techniques of (morphosyntactic) linguistic annotation
- familiarity with the main distributional semantics approaches
2. Applying knowledge and understanding
- knowledge of the features and limitations of the most common computational linguistics tools and approaches, so as to be able to pick the most appropriate solution for a given linguistic research issue
- ability to propose insightful ideas
3. Making judgements
- ability to retrieve the most relevant literature and to use it critically
- ability to select a suitable theoretical framework to answer a research question of interest
- awareness of the technical and deontological issues connected to the automatic treatment of language
- ability to compare competing hypotheses
4. Communication skills
- ability to write an insightful essay on an innovative research topic
- ability to interact with researchers with a different scientific background (among which, computational linguists and cognitive scientists)
5. Learning skills
- ability to learn novel technical tools for the automatic treatment of language (e.g. annotation tools, corpora management and query tools)
Pre-requirements
Basic mathematical skills
Contents
2. Corpus linguistics: the basics
3. Distributions in text
4. Language and probability
5. Language and probability II
6. Linguistic annotation
7. The annotation process and its evaluation
8. Classification
9. Regular Expressions
10. Computational Lexical Semantics
11. Distributional Semantics: collocations and associations measures
12. Distributional Semantics: semantic similarity and applications
Referral texts
- M. Baroni (2009) Distributions in text. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 803-821. Available online at: http://sslmit.unibo.it/~baroni/publications/hsk_39_dist_rev2.pdf
- M. Davies (2015) Corpora: An introduction. In D. Biber and R. Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics, Cambridge University Press: 11-31.
- S. Evert (2009) Corpora and collocations. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 1212-1248 (sections 1-4). Available online at: http://www.stefan-evert.de/PUB/Evert2007HSK_extended_manuscript.pdf
- S.T. Gries and A. L. Berez (2017) Linguistic Annotation in/for Corpus Linguistics. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 379-409. Retrieved from: http://www.stgries.info/research/2017_STG-ALB_LingAnnotCorpLing_HbOfLingAnnot.pdf
- S.T. Gries and J. Newman (2010) Creating And Using Corpora. In R. J. Podesva and D. Sharma (eds.), Research Methods in Linguistic, Cambridge University Press: 257-287. Available online at: http://www.stgries.info/research/2013_STG-JN_CreatingUsingCorpora_ResMethLing.pdf
- D. Jurafsky and J. H. Martin (2008) Speech and Language Processing, 2nd edition, Prentice Hall (ch. 1, 2, 4, 19.1-19.4, 20.1, 20.6)
- D. Jurafsky and J. H. Martin (2020) Speech and Language Processing, 3rd edition draft, Prentice Hall (ch. 4). Available online at: https://web.stanford.edu/~jurafsky/slp3/
- A. Lenci (2018) Distributional Models of Word Meaning, Annual Review of Linguistics, 4: 151-171. Available online at: http://colinglab.humnet.unipi.it/wp-content/uploads/2012/12/annurev-linguistics-030514-125254.pdf
- C. Manning and H. Schütze (1999) Foundations of Statistical Natural Language Processing, MIT Press (ch. 1.1-1.3)
- Poesio et al (2018): M.Poesio, J. Chamberlain and U. Kruschwitz (2018) Crowdsourcing. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 277-296
Assessment methods
THE ORAL EXAM
The oral exam consists of a set of questions aimed to verify students' knowledge of the theoretical issues discussed in class, and exercises to assess students' mastery of the most important methodological constructs addressed in the course (e.g. association measures, FOPL formulas).
THE GROUP PRESENTATION
Students will organize into small groups and prepare a 20-minutes presentation on a Computational Linguistics or Natural Language Processing topic. Students are encouraged to focus on an applicative domain or on a scientific question for which they feel a sincere interest. Note that the specific topic of the project should have been agreed upon with the instructor.
Each group presentation will be graded as follows:
- teamwork: 20% of the final grade
- delivery (verbal and non-verbal skills): 20% of the final grade
- visual aids: 20% of the final grade
- content: 40% of the final grade
THE ESSAY
Students are required to write a 3000+ words essay on a Computational Linguistics or Natural Language Processing topic. Students are encouraged to focus on an applicative domain or on a scientific question for which they feel a sincere interest. Note that the specific topic of the project should have been agreed upon with the instructor. The following resources can be used as a reference list of possible domains or topics:
- R. Mitkov (2023, ed.) The Oxford Handbook of Computational Linguistics, 2nd edition, Oxford University Press.
- A. Clark, C. Fox and S. Lappin (2010, eds.) The Handbook of Computational Linguistics and Natural Language Processing, Wiley Blackwell.
The final essay will be graded as follows:
- mastery of the essay topic and critical use of the relevant literature: 50% of the final grade
- depth of thought: 20% of the final grade
- overall readability of the essay: 30% of the final grade
GRADE BREAKDOWN.
The final grade will be calculated as follows:
- oral exam: 50% of the final grade
- final essay or group presentaiton: 50% of the final grade