Reproducible Quantitative Data Science
PhD School at the Faculty of SCIENCE at University of Copenhagen
This is a toolbox course where 80% of the seats are reserved to PhD students enrolled at the Faculty of SCIENCE at UCPH and 20% of the seats are reserved to PhD students from other Danish Universities/faculties (except CBS). Seats will be allocated on a first-come, first-served basis and according to the applicable rules.
Anyone can apply for the course, but if you are not a PhD student at a Danish university (except CBS), you will be placed on the waiting list until enrollment deadline. After the enrollment deadline, available seats will be allocated to applicants on the waiting list.
Aim and Content
The Reproducible Quantitative Data Science course introduces key concepts, tools and analysis methods for reproducible data analysis in any type of quantitative research study. It is meant as a hands-on crash course in reproducible data analysis for PhD students.
In the course, we will cover the area of research data management and best practices for data before introducing the concepts of reproducible designs, protocols and pre-registration of research studies. Next, we will focus on literate programming and good coding practices and focus on how to improve the student’s code to make it more reproducible. Part of this is include using version control and also how to encapsulate code using containers. We will then go into issues in the actual data analysis and address computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization. We are finishing the course by introducing the topic of reproducible publishing.
Learning outcomes
Intended learning outcome for the students who complete the course:
Knowledge:
• Understand the concepts of reproducible designs, protocols and pre-registration of research studies
• Understand good coding practices
• Understand computational analysis methods such as permutation, bootstrap, cross-validation and out-of-sample generalization
Skills:
• Version control and social coding
• Develop literate programming and good coding practices
• Encapsulate code for reproducibility using containers
Competences:
• Propose measures to increase reproducibility in their own PhD research data analysis
• Prepare a manuscript in a reproducible fashion
Target Group
PhD students from natural sciences and/or using quantitative analysis, but especially from UCPH SCIENCE and SUND.
We expect students to join the course several months after starting their PhD allowing them to already have data and some code. This will allow applying the concepts developed to their own data and code.
Recommended Academic Qualifications
We expect students to join the course several months after starting their PhD allowing them to already have data and some code. This will allow applying the concepts developed to their own data and code.
We assume that the students have some experience with programming as one cannot reproduce analyses using a graphical interface but only using code. We’ll try to be as agnostic as possible language wise, but prior exposure of bash/git, MatLab, Python are a plus.
During the course, active participation is expected including sharing an example of code written by the students for code review.
Research Area
All disciplines related to natural sciences and/or using quantitative analysis can benefit from out course.
Teaching and Learning Methods
The students need to prepare background information before the course by going through the provided reading material.
During the physical meeting days, we intersperse lectures with exercises. A full overview over our teaching materials is publicly available on GitHub:
https://github.com/CPernet/ReproducibleQuantitativeDataScience
Between the physical meetings the students will individually work on exercises.
Type of Assessment
Active participation during lectures
Homework assignments and presentation on the final day
Literature
We already have a Zotero group with all the course literature that can be made available, e-mail ganz@di.ku.dk to be added to the Zotero group
Course coordinator
Melanie Ganz-Benjaminsen, Associate Professor, DIKU
Guest Lecturers
Possibly physical or remote lectures by:
Prof. Michael Hanke (https://www.fz-juelich.de/profile/hanke_m) – he will contribute on the computational reproducibility session and provide lectures and exercises
Prof. Nikola Stikov (https://neuro.polymtl.ca/team/faculty/nikola-stikov) – he will contribute in the session on reproducible publishing and provide lectures as well as exercises
Dates
22.+23. June 2026
21.+22. September 2026
23. November 2026
All days 9-16
Expected frequency
Annually.
The course starts in June 2026 and runs through December 2026 with lectures happening in the re-exam weeks in June, September and November in order to not conflict with other teaching duties the PhD students have and to easier have access to rooms on Nørre Campus.
Course location
Nørre Campus
Course fee
• Participant fee: DKK 0
• PhD student enrolled at SCIENCE: DKK 0
• PhD student from Danish PhD school Open market: DKK 0
• PhD student from Danish PhD school not Open market: DKK 3.000
• PhD student from foreign university: DKK 3.000
• Master's student from Danish university: DKK 0
• Master's student from foreign university: DKK 3.000
• Non-PhD student employed at a university (e.g., postdocs): DKK 3.000
• Non-PhD student not employed at a university (e.g., from a private company): DKK 8.400
Cancellation policy
• Cancellations made up to two weeks before the course starts are free of charge.
• Cancellations made less than two weeks before the course starts will be charged a fee of DKK 3.000
• Participants with less than 80% attendance cannot pass the course and will be charged a fee of DKK 5.000
• No-show will result in a fee of DKK 5.000
• Participants who fail to hand in any mandatory exams or assignments cannot pass the course and will be charged a fee of DKK 5.000
Course fee and participant fee
PhD courses offered at the Faculty of SCIENCE have course fees corresponding to different participant types.
In addition to the course fee, there might also be a participant fee.
If the course has a participant fee, this will apply to all participants regardless of participant
type - and in addition to the course fee.