LITERARY AND LINGUISTIC COMPUTING
- Academic year
- 2024/2025 Syllabus of previous years
- Official course title
- LITERARY AND LINGUISTIC COMPUTING
- Course code
- FM0484 (AF:508198 AR:285010)
- Modality
- On campus classes
- ECTS credits
- 6
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- L-LIN/01
- Period
- 2nd Semester
- Course year
- 1
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
The main goals of this course are:
- to provide the students with the basic technical tools for the computational treatment of textual data
- to introduce the students to the fundamental linguistic annotation techniques and tools
- to strengthen the students' knowledge of the Python programming language as well as to introduce them to some of its NLP modules, among which Stanza and gensim
- to stimulate critical thinking and the ability to think out of the box
Expected learning outcomes
- familiarity with the Python programming language and with some of its NLP/text mining packages (Stanza, gensim)
- familiarity with the most commonly used techniques of (morphosyntactic) linguistic annotation
- learning of the basic techniques for the extraction of linguistic knowledge from corpora
- knowledge of the principal levels of linguistic annotation
- familiarity with the most commonly used techniques for the representation of structured information extracted from text
2. Applying knowledge and understanding
- knowledge of the features and limitations of the most common computational linguistics tools and approaches, so as to be able to pick the most appropriate solution for a given linguistic research issue
- use of Python for the implementation of scripts for the quantitative and computational analysis of text
- ability to advance and test original and sounded hypotheses
3. Making judgements
- ability to implement self-development strategies to improve technical skills
- awareness of the technical and deontological issues connected to the automatic treatment of language
- ability to retrieve the most relevant literature and to use it critically
- ability to compare competing hypotheses
4. Communication skills
- ability to write a report to describe the process, progress and result of an original scientific research
- ability to interact with the other students and the professor
5. Learning skills
- ability to learn novel scripting languages (among which, R, PERL, Matlab, Javascript)
- ability to acquire technical knowledge pertaining to issues only indirectly linked to the automatic treatment of language (e.g. the statistical analysis)
- ability to learn novel technical tools for the automatic treatment of language (e.g. annotation tools)
Pre-requirements
Contents
- Optical Character Recognition with Python
- Regular Expressions
- Automatic corpus annotation
- Distributional semantics
- Topic modeling
Referral texts
- M. Baroni (2009) *Distributions in text*. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 803-821.
- D.M. Blei (2012) *Probabilistic topic models*. Communications of the ACM, 55 (4): 77-84.
- M. Davies (2015) Corpora: An introduction. In D. Biber and R. Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics, Cambridge University Press: 11-31.
- M. C. de Marneffe and J. Nivre (2019) Dependency Grammar. Annual Review of Linguistics 5: 197-218.
- S.T. Gries and A. L. Berez (2017) Linguistic Annotation in/for Corpus Linguistics. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 379-409.
- M. Hammond (2020) Python for Linguists. Cambridge University Press
- D. Hovy (2021) Text Analysis in Python for Social Scientists: Discovery and Exploration. Cambridge University Press
- D. Jurafsky and J. H. Martin (2020) Speech and Language Processing, 3rd edition, DRAFT (ch. 4, 6).
- A. Lenci (2018) Distributional Models of Word Meaning, Annual Review of Linguistics, 4: 151-171.
- T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, Y. and D. Woodard (2017) Surveying stylometry techniques and applications. ACM Computing Surveys (CSUR), 50 (6): 1-36.
Assessment methods
The project will be graded as follows:
- quality of the code: 40% of the final grade
- knowledge of the relevant literature and of the state-of-the-art: 20% of the final grade
- quality of the report: 30% of the final grade
- one‐on‐one discussion with the instructor: 10% of the final grade
Teaching methods
- discussion of some programming exercises from the past homework
- overview of the session key concepts and principles
- work on the programming exercises in the relevant Jupyter notebook available on [the university e-learning platform](https://moodle.unive.it/ )