COMPUTATIONAL PHILOLOGY: DATA STRUCTURES AND ALGORITHMS

Academic year: 2022/2023 Syllabus of previous years

Official course title: COMPUTATIONAL PHILOLOGY: DATA STRUCTURES AND ALGORITHMS

Course code: FM0488 (AF:354986 AR:208518)

Teaching language: English

Modality: On campus classes

ECTS credits: 6

Degree level: Master's Degree Programme (DM270)

Academic Discipline: L-LIN/01

Period: 3rd Term

Course year: 2

Where: VENEZIA

Moodle: Go to Moodle page

Contribution of the course to the overall degree programme goals

As part of the curriculum of the Master's Degree in Digital and Public Humanities, this course aims at providing the students with a working knowledge of the basic techniques for the computational annotation and analysis of written text.

The main goals of this course are:

- to provide the students with the basic technical tools for the computational treatment of textual data
- to introduce the students to the fundamental linguistic annotation techniques and tools
- to strengthen the students' knowledge of the Python programming language as well as to introduce them to some of its NLP modules, among which Stanza and gensim
- to stimulate critical thinking and the ability to think out of the box

Expected learning outcomes

1. Knowledge and understanding
- familiarity with the Python programming language and with some of its NLP/text mining packages (Stanza, gensim)
- familiarity with the most commonly used techniques of (morphosyntactic) linguistic annotation
- learning of the basic techniques for the extraction of linguistic knowledge from corpora
- knowledge of the principal levels of linguistic annotation
- familiarity with the most commonly used techniques for the representation of structured information extracted from text

2. Applying knowledge and understanding
- knowledge of the features and limitations of the most common computational linguistics tools and approaches, so as to be able to pick the most appropriate solution for a given linguistic research issue
- use of Python for the implementation of scripts for the quantitative and computational analysis of text
- ability to advance and test original and sounded hypotheses

3. Making judgements
- ability to implement self-development strategies to improve technical skills
- awareness of the technical and deontological issues connected to the automatic treatment of language
- ability to retrieve the most relevant literature and to use it critically
- ability to compare competing hypotheses

4. Communication skills
- ability to write a report to describe the process, progress and result of an original scientific research
- ability to interact with the other students and the professor

5. Learning skills
- ability to learn novel scripting languages (among which, R, PERL, Matlab, Javascript)
- ability to acquire technical knowledge pertaining to issues only indirectly linked to the automatic treatment of language (e.g. the statistical analysis)
- ability to learn novel technical tools for the automatic treatment of language (e.g. annotation tools)

Pre-requirements

Basic knowledge of the Python programming language

week 1: Text manipulation with Python
week 2: Automatic corpus annotation
week 3: Distributional semantics
week 4: Topic modeling
week 5: Stylometry & authorship attribution

Referral texts

Together with the Jupyter notebooks available on [the university e-learning platform](https://moodle.unive.it/ ) , the following background readings will provide the student with an in-depth explanation of the key concepts of the course:

- M. Baroni (2009) *Distributions in text*. In A. Lüdeling and M. Kytö (eds.), Corpus linguistics: An international handbook, Vol. 2, Mouton de Gruyter: 803-821. Available online at: http://sslmit.unibo.it/~baroni/publications/hsk_39_dist_rev2.pdf
- D.M. Blei (2012) *Probabilistic topic models*. Communications of the ACM, 55 (4): 77-84. Available online at: http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf
- M. Davies (2015) Corpora: An introduction. In D. Biber and R. Reppen (eds.), The Cambridge Handbook of English Corpus Linguistics, Cambridge University Press: 11-31.
- M. C. de Marneffe and J. Nivre (2019) Dependency Grammar. Annual Review of Linguistics 5: 197-218.
- S.T. Gries and A. L. Berez (2017) Linguistic Annotation in/for Corpus Linguistics. In N. Ide and J. Pustejovsky (eds.), Handbook of Linguistic Annotation, Springer: 379-409. Available online at: http://www.stgries.info/research/2017_STG-ALB_LingAnnotCorpLing_HbOfLingAnnot.pdf
- M. Hammond (2020) Python for Linguists. Cambridge University Press
- D. Hovy (2021) Text Analysis in Python for Social Scientists: Discovery and Exploration.
Cambridge University Press
- D. Jurafsky and J. H. Martin (2020) Speech and Language Processing, 3rd edition, DRAFT (ch. 4, 6). Available online at: https://web.stanford.edu/~jurafsky/slp3/
- A. Lenci (2018) Distributional Models of Word Meaning, Annual Review of Linguistics, 4: 151-171.
- T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, Y. and D. Woodard (2017) Surveying stylometry techniques and applications. ACM Computing Surveys (CSUR), 50 (6): 1-36. Available online at: https://dl.acm.org/doi/abs/10.1145/3132039

Assessment methods

Students are required to carry out a programming project that should be described in detail in a written report and discussed face to face with the instructor during the oral exam. The aim of the project is to build an automatically annotated corpus and to use Python to extract the linguistic information that is needed to perform an innovative quantitative linguistic analysis. Note that the specific topic of the project should have been agreed upon with the instructor. The final report must be submitted electronically at least one week prior to the exam.

The project will be graded as follows:
- quality of the code: 40% of the final grade
- knowledge of the relevant literature and of the state-of-the-art: 20% of the final grade
- quality of the report: 30% of the final grade
- one‐on‐one discussion with the instructor: 10% of the final grade

Type of exam

oral

Teaching methods

Lecture-style presentations and lab sessions structured as follows:
- discussion of some programming exercises from the past homework
- overview of the session key concepts and principles
- work on the programming exercises in the relevant Jupyter notebook available on [the university e-learning platform](https://moodle.unive.it/ )

Definitive programme.

Last update of the programme: 21/12/2022

Type	Name	Sender (Domain)	Description	Duration	Policy
Essential	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	They maintain the session data of the SingleSignOn.	session	Information by Ca' Foscari University
Essential	PHPSESSID	Unive.it (www.unive.it)	Unique user identifier for the website applications.	session	Information by Ca' Foscari University
Essential	cookie[*]	Unive.it (www.unive.it)	It stores the user's preferences on cookies. user preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	cookie	idp.unive.it	It stores the user's preferences on cookies.	1 month	Information by Ca' Foscari University
Essential	fe_typo_user	Unive.it (www.unive.it)	Unique user identifier for the reserved area of the website	session	Information by Ca' Foscari University
Essential	JSESSIONID	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	ADMCMD_prev	Unive.it (www.unive.it)	Used to create web sessions into the Personal Area.	session	Information by Ca' Foscari University
Essential	unive.it	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	noiframe	Unive.it (www.unive.it)	It stores the user's preferences on cookies.	6 months	Information by Ca' Foscari University
Essential	_pk_id[*]	unive/WAI	*	30 days	Information by Matomo
Essential	_pk_ses[*]	unive/WAI	*	1 day	Information by Matomo
Essential	_pk_ref[*]	unive/WAI	*	6 months	Information by Matomo
Essential	_gsas[*]	unive/google	It stores the user's preferences on cookies.	3 months	Information by Google
Essential	_opensaml_req_cookie%[*]	unive	Authentication and SingleSignOn (shibboleth)	session	Information by Ca' Foscari University
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Google - Youtube	CONSENT	Google (google.com)	Used by Google to store the user's preferences.	17 years	Information by Google
Google - Youtube	__Secure-1PSID	Google (google.com)	Used for targeting purposes in order to acquire web visitors' interests and show them pertinent and customised Google advertising.	2 years	Information by Google
Essential	Socialpix	Unive.it (www.unive.it)	They are used to record cookie preferences	6 months	Information by Ca' Foscari University
Facebook - Pixel	_fbp	Unive.it (www.unive.it)	Tracks users for retargeting advertising on Facebook	3 months	Information by Facebook
Facebook - Pixel	datr	Facebook	Marketing	2 anni	Information by Facebook