Statistical methods in bioinformatics
Graduate School of Health and Medical Sciences at University of Copenhagen
This is a generic course. This means that the course is reserved for PhD students at the Graduate School of Health and Medical Sciences at UCPH.
Anyone can apply for the course, but if you are not a PhD student at the Graduate School, you will be placed on the waiting list until enrollment deadline. After the enrolment deadline, available seats will be allocated to the waiting list.
The course is free of charge for PhD students at Danish universities (except Copenhagen Business School), and for PhD students at NorDoc member faculties. All other participants must pay the course fee
Learning objectives
A student who has met the objectives of the course will be able to:
Bioinformatics is concerned with the study of inherent structure of biological information and statistical methods are the workhorses in many of these studies. Some of this inherent structure is obvious and can be observed directly through correlations of patterns in high-dimensional data, while other patterns arise through more complicated underlying relationships.
This course covers some of the basic and novel statistical models and methods suitable for analysing high dimensional data - in particular high dimensional data that rely heavily on statistical methods. The course will contain of equal parts theory and applications and consists of five full days of teaching and computer lab exercises. It is the intention that the participants will have a thorough understanding of the statistical methods and are able to apply them in practice after having followed this course.
A student who has met the objectives of the course will be able to:
1. Analyse data from a bioinformatics experiment using the methods described below and draw valid conclusions based on the results obtained.
2. Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.
3. Develop new methods that can be used to analyse novel types of bioinformatics data.
Content
1. Brief overview of molecular data. Introduction to statistical methods for high-dimensional data, linear models and regularization methods
- Big-p small-n problems
- Multiple testing techniques (inference correction, false discovery rates, q-values)
- The correlation vs. causation and prediction vs. hypothesis differences
- Penalized regression approaches, principal component regression
2. Analysis of mapped reads from mRNA data
- General assembly
- Dynamic programming of pairwise alignment
- Alignment methods for mRNA data
- Poisson methods for expression quantification and transcript distribution
3. Genome-wide association studies
- Multiple testing problems
- Imputation
- Common variants vs rare variants. Sequence Kernel Association Test
- Regularization methods, SVM
- Enrichment approaches, gene-set analyses,
4. Network biology
- Quality assessment and heterogeneous data integration
- Biomedical text mining (named entity recognition & co-occurrence analysis)
- Network analysis with STRING and Cytoscape
5. Integrative data analysis
- Zero-inflated and hurdle models (microbiome data and RNA-seq revisited)
- Compositional data analysis
- Gene expression analyses
- Combining data and making inference from multiple platforms and experiments
Participants
The course is tailored for Ph.D.-students with experience in mathematics, statistics, or bioinformatics, who wish to have more knowledge about the statistical methods underlying the approaches used for common problems in bioinformatics.
A basic knowledge of statistics including a little exposure to calculus is expected. However, little or no previous exposure to the topics covered is expected. Students from applied fields are welcome on the course but should expect extra focus on the statistical methodology.
Relevance to graduate programmes
The course is relevant to PhD students from the following graduate programmes at the Graduate School of Health and Medical Sciences, UCPH:
All graduate programmes
Language
English
Form
The course will consist of 5 full days with lectures before lunch and hands-on computer exercises after lunch each day
Course director
Claus Thorn Ekstrøm, Professor, Section of Biostatistics, Department of Public Health, University of Copenhagen, ekstrom@sund.ku.dk
Teachers
Claus Thorn Ekstrøm, Professor, Section of Biostatistics, University of Copenhagen.
Stefan Seeman, Associate Professor, Animal Genetics, Bioinformatics and Breeding, University of Copenhagen.
Nadezhda Tsankova Doncheva, Assistant Professor, Novo Nordic Foundation Center for Protein Research, Disease Systems Biology, University of Copenhagen.
Dates
28, 29, 30 April, 1, 2 May 2025, all days 8-15
Course location
CSS
Registration
Please register before 24 March 2025
Seats to PhD students from other Danish universities will be allocated on a first-come, first-served basis and according to the applicable rules.
Applications from other participants will be considered
Note: All applicants are asked to submit invoice details in case of no-show, late cancellation or obligation to pay the course fee (typically non-PhD students). If you are a PhD student, your participation in the course must be in agreement with your principal supervisor.