Advanced Topics in Data Analysis
Graduate School of Health and Medical Sciences at University of Copenhagen
This is a generic course. This means that the course is reserved for PhD students at the Graduate School of Health and Medical Sciences at UCPH.
Anyone can apply for the course, but if you are not a PhD student at the Graduate School, you will be placed on the waiting list until enrollment deadline. After the enrolment deadline, available seats will be allocated to the waiting list.
The course is free of charge for PhD students at Danish universities (except Copenhagen Business School), and for PhD students at NorDoc member faculties. All other participants must pay the course fee
Learning objectives
A student who has met the objectives of the course will be able to:
1. Understand the probabilistic principles behind statistical analysis of large-scale datasets in the life, earth and environmental sciences
2. Identify which types of statistical methods are appropriate for different types of large-scale datasets
3. Analyze data in an efficient manner using the R or a similar statistical language
4. Diagnose and assess the results of statistical methods used in life, earth and environmental sciences, accounting for the assumptions underlying each test
Content
This course is meant as an exposure to the state-of-the-art statistical techniques commonly used in life, environmental and earth sciences. It is a natural follow-up to the course on Fundamentals in Large-Scale Data Analysis offered within the “Life, Earth and Environmental Sciences” Programme. In the first half of the course, the attendees will learn about the philosophy and techniques behind Bayesian thinking and inference, while also applying these methods on practical, real-world examples, using the R programming language. First, the students will be exposed to building, running and evaluating a model, including topics like posterior predictive checks, confounders, model evaluation and causal inference. In the second half of the course, the students will be introduced to machine learning techniques. Then, we will provide a broad overview of machine learning methods, including random forests, support vector machines, deep learning, in various scientific applications. Finally, we will discuss various aspects of being a good scientific data scientist, including the ethical and ecological implications of high-intensity scientific computing, data management and sharing.
Participants
The course is broadly meant for students in life, earth and/or environmental sciences who aim to develop their statistical and computational toolbox, in order to be able to tackle large-scale datasets. Students should have some background in basic probability, statistical inference and/or data science.
Course prerequisites
1.A basic understanding of probability theory and distributions.
2.The student must have taken the “Fundamentals in Large-Scale Data Analysis” course OR the student must have a waiver - by demonstrating their knowledge of the contents of the basic data analysis course.
3.The student must have a working familiarity with the R environment or another similar language. The student must also be familiar with basic commands on the unix command line.
Please take a look at the learning objectives of the “Fundamentals in Large-Scale Data Analysis” course for details on what skills the student is expected to have at the end of that course.
Relevance to graduate programmes
The course is relevant to PhD students from the following graduate programmes at the Graduate School of Health and Medical Sciences, UCPH:
?Life, Earth and Environmental Sciences
?Biostatistics and Bioinformatics
Language
English
Form
Lectures interspersed with discussions and group work involving computational exercises in R and the unix console.
Course director
Fernando Racimo, Associate Professor, University of Copenhagen, fracimo@sund.ku.dk
Shyam Gopalakrishnan, Associate Professor, University of Copenhagen, shyam.gopalakrishnan@sund.ku.dk
Teachers
Shyam Gopalakrishnan (course director)
Fernando Racimo (course director)
Martin Sikora, Associate Professor, KU
Moiz Khan Sherwani, Postdoc, KU
Dates
Block 4 - 2 weeks – Weeks 22 and 23 – May 26th to June 6th 2025 (weekdays - 9:00 AM - 2:30 PM)
Course location
Teaching rooms in Kommunehospital Bldg 7.
Registration
Please register before April 28th, 2025.
Expected frequency
The course will be offered at the beginning of block 4 every year.
Seats to PhD students from other Danish universities will be allocated on a first-come, first-served basis and according to the applicable rules.
Applications from other participants will be considered after the last day of enrolment.
Note: All applicants are asked to submit invoice details in case of no-show, late cancellation or obligation to pay the course fee (typically non-PhD students). If you are a PhD student, your participation in the course must be in agreement with your principal supervisor.