PhD Courses in Denmark

AI Alignment, safety and security: Applications in Mental Health and Human-Centered Systems

DTU Department of Informatics and Mathematical Modeling

General course objectives:

The course trains early-career researchers in AI safety and security, with applications in mental-health and human-centred applications. It will offer a technically grounded exploration of AI alignment and safety, moving beyond harm prevention toward systems that exhibit context awareness, robustness, and (human, societal) values. Using mental health as a motivating domain, it introduces methods and evaluation frameworks for building aligned, trustworthy, and human-centered AI systems applicable to other high-stakes areas. The emphasis is on critical and technical reflection, connecting advances in alignment research with the question of how AI systems can perform appropriately, transparently, and responsibly in human contexts.



Learning objectives:

A student who has met the objectives of the course will be able to:

  • Understand the theoretical and practical foundations of AI alignment and safety.
  • Formalize alignment objectives and implement them in frameworks.
  • Apply and compare evaluation techniques for robustness, bias, uncertainty, and failure modes and risks in sensitive applications.
  • Examine deployment considerations, including clinician- or human-in-the-loop design, monitoring, and oversight.
  • Critically reflect on the transition from safe to sensible AI: systems that act appropriately, not just safely, in human and societal contexts.
  • Analyse misalignment failure modes in large language and generative models, and understand how these arise from training data, objectives, and sociotechnical context.
  • Integrate qualitative and quantitative evaluation methods to assess model behaviour, appropriateness, and potential harms in human-centered or mental-health–related applications.
  • Collaborate in interdisciplinary settings, effectively communicating alignment challenges and insights across machine learning, psychology, clinical practice, and human–computer interaction.

Contents:

The course will comprise of two main parts, a) Technical foundations of AI alignment, safety and security, and b) Development and evaluation of aligned AI for mental-health and other human-centered applications. Indicative topics include: 1.​ Epistemological foundations of machine learning and the nature of alignment. 2.​ Safety, robustness, and security in large language and generative models, including adversarial behavior and control. 3.​ Value alignment in data, objectives, and feedback-driven learning. 4.​ Evaluation methodologies for trustworthy and human-aligned AI, including quantitative and qualitative approaches. 5.​ Cognitive and societal dimensions of alignment in human-AI interaction and decision-making. 6.​ Case studies from mental health, and assistive or decision-support systems. 7.​ Clinical, behavioral, ethical and philosophical perspectives linking technical alignment to human judgment and sensibility.