Data Science Projects (generic course)
PhD School at the Faculty of SCIENCE at University of Copenhagen
Data Science covers both Machine Learning and Statistics. This generic course provides a platform to develop and work on projects with the student’s own data using either Machine Learning methods, Statistical data analysis, or possibly a combination, with supervisory support from the course teachers. The data sources can range from data from designed experiments, observational data, and surveys in text or digital formats, to pictures, scans, videos or graphs. All related to some scientific investigation, typically from the PhD student’s own work.
Depending on the primary scope of each of data analysis problems, either an expert in Machine Learning or an expert in Statistics will supervise the project. Typically, Machine Learning projects will use Python, and Statistics projects use R. However, other software platforms are possible depending on the student’s preferences.
Typical analysis within the scope of Machine Learning could be automated quantification of objects of interests in the data (for example, image analysis), or combining different types of data to address a common research question (like combining text and measurements), or building predictive models using Machine Learning.
Typical analysis within the scope of Statistics is modelling of experimental data in order to establish an associative or causal relation between an outcome of interest and some explanatory variables, e.g. application of different treatments. Subsequent, to quantify relations that cannot be explained by biological variation, but must be attributed to real effects.
The course report will be a manuscript written like a draft of a research paper – that may ideally be completed and submitted to a journal following the course.
No later than one week prior to the course, the participants should submit a synopsis with a short draft description of their data and the desired outcome. This will allow us to consider plenum lectures on some specific analysis methods and to plan the project supervisions.
Formal requirements
The students must either have some Statistics or Machine Learning experience, corresponding to either the Statistical Methods for SCIENCE or Machine Learning for SCIENCE PhD toolbox courses.
Please email and ask the course organizer Erik Bjørnager Dam erikdam@di.ku.dk in case of doubts about prerequisites.
The number of participants is limited to 20.
PhD students from outside UCPH SCIENCE are permitted for a fee, if seats are available.
Learning outcome
After course completion, the students are expected to be able to:
Knowledge:
- Describe the analysis methods used by others for similar problems.
- Describe relevant, alternative approaches for solving the problem.
Skills:
- Develop/adapt/extend a computer-based software method for quantification and/or analysis of their own data.
Competences:
- Formulate scientific questions from their PhD project in terms of research hypotheses.
- Interpret the results of their computer-based analysis in relation to their PhD project.
Literature
This depends on the individual project.
For potential background literature, see the course pages for the Introduction to Python, Introduction to R, Statistical Methods for Science (SMS), Machine Learning for Science (MLS), all listed on the Data Science Lab homepage.
Remarks
For details for this and other Data Science Lab courses, see: http://datalab.science.ku.dk/english/course/