Introduction to Programming for Statistics

Academic year
2024/2025 Syllabus of previous years
Official course title
Introduction to Programming for Statistics
Course code
PHD176 (AF:545181 AR:311562)
Modality
On campus classes
ECTS credits
3 out of 6 of Introduction to Programming for Statistics and Machine Learning
Degree level
Corso di Dottorato (D.M.226/2021)
Educational sector code
SECS-S/01
Period
1st Semester
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
Statistical analysis is a powerful tool in environmental studies. Using correct statistical methods and tools can help us to understand the data, as well as to infer potential causal relationship. Moreover, recent years have seen an unprecedented growth in machine learning approaches in the fields and of environment (including climate) and data sciences. This course includes two blocks or components. The first will introduce students to the R programming environment, and the second will focus on Python. Both blocks will be split into equal academic teaching hours (7.5 hours each). The main objective of the course is to help students to develop skills in the two programming languages: (i) to analyse climate data with a focus on exploratory data analysis, (ii) geospatial analysis and (iii) regression techniques. Due to limited learnings hours, the course is designed to enable students to develop a foundation in both R and Python, with further skills intended to be developed in the two Data Lab modules in the second trimester.
Students are expected to attain a basic understanding of: (i) R: Install/manage packages using R and RStudio (Posit), perform basic arithmetic and statistical operations, exploratory data analysis (including data cleaning, transformation), plotting, reading/writing data, (ii) For Python: Install and understand Anaconda/Conda package environment, command line tools, Interactive Development Environments such as Posit, Spyder etc., Google Colab, perform basic arithmetic, exploratory data analysis (including data cleaning, transformation). In addition, students are expected to understand basic file formats used in earth observations, and common approaches to read, process and analyse the data.
Basic understanding of any programming language would be useful not required. Undergraduate level understanding of linear algebra and statistics would be useful. Not getting bored reading circa 1000 lines of code is the most important criteria that will help to enjoy this course.
Introduction to R / Rstudio, Basic Data structures (Vectors, Matrices, Data Frames), Arithmetic/Statistical operations in R, Plots & Handling data in R (Input/Output), some recent packages for Summary Statistics. Raster operations/NetCDF files in R and advanced geo-spatial operations are likely to be covered only in the second trimester (in the Data Lab module). Similarly, for Python, the sessions will begin by giving an introduction of Python, understanding compiled and interpreted language, installation of Anaconda, Posit, Jupyter Notebook etc., coding in different environments (Google Colab, Jupyter Notebook/Lab etc.). Complex libraries such as Xarray designed to handle earth observation data (multi-dimensional files) will be introduced in the Data Lab sessions in the second trimester.
In addition to the material provided in each lecture (which includes slides, data and scripts), additional information on below weblink will be useful:

For R:
http://www.statmethods.net/ (Excellent to begin learning R)
https://cran.r-project.org/doc/contrib/ (Very useful resources)
https://cran.r-project.org/doc/manuals/R-intro.pdf (Quick Intro to R)
https://www.r-bloggers.com/tag/rwiki/ (Advance)

For Python:
https://www.python.org
https://jupyter.org/
https://colab.research.google.com
https://www.w3schools.com/python/python_intro.asp
During the course, the students will be asked to participate in interactive sessions (coding skills) and graded on their active engagement and a general understanding of programming concepts. Attendance and participating in classroom will count towards 100% for the final grade.
Each lecture will combine a frontal lecture and in-class activities (hands-on sessions using sample data and analysis/scripts prepared in R and in Python). Activities will allow students to become familiar with the methods and tools introduced in the course for the analysis of environmental/geospatial data.
English
Further details about readings, required data and software installation including practical exercises will be communicated at the beginning of the course and published on Moodle.
oral
Definitive programme.
Last update of the programme: 13/09/2024