PhD Courses in Denmark

Social data science – machine learning in the humanities and social sciences (MASSHINE Summerschool)

The Doctoral School of Social Sciences and Humanities at Aalborg Universitet

Course organizer (name, department and research group):


Title and date of the course:

“Social data science – machine learning in the humanities and social sciences”. 26 – 30 August 2024

This course is supported by Danish Data Science Academy (DDSA)


Hotel Højgaarden, Slettestrandvej 50, 9690 Fjerritslev


 Associate Professor Roman Jurowetzki (Aalborg University Business School)

Associate Professor Rolf Lyneborg Lund (Department of Sociology and Social Work, Aalborg University)

Professor Birger Larsen (Department of Communication and Psychology)

Assistant Professor Mathieu Jacomy (Department of Culture and Learning, Aalborg University)


Numbers of seats:
21 - We will contact you after registration deadline for letting you know whether your admission has been accepted in the course or you are on the waiting list.

1 July 2024  

Course description, incl. learning objectives and prerequisites:

The developments in computer science technologies and the increasing amount of accessible data present a range of new methodological opportunities for the social sciences and humanities.

Data from websites, social media, and electronic devices (often referred to as ‘Big Data’) allow for new approaches and perspectives on issues relevant for both the social sciences and humanities. Meanwhile, the increasing computational power and development of artificial intelligence algorithms provide the means for accessing, combining, and analyzing a variety of data types (numerical, textual, relational) in new and meaningful ways.

 This course is a hands-on practical introduction with no prerequisites in applying computer science techniques (like programming and machine learning) in humanities and social science research. It will cover a broad range of techniques and methods representing the latest methodological innovations in social science and humanities applications of machine learning and artificial intelligence.

Some techniques include:

  • Collecting data from the web using web scraping methods and API's
  • Processing textual data for quantitative analysis (Natural Language Processing)
  • Working and visualizing networks (network analysis)
  • Dimensionality reduction and clustering techniques (topic models and k-means clustering)
  • Visualization techniques for text data and networks
  • Building and understanding machine learning classifiers

This course is meant as a hands-on tools course focusing on the practical use of these methods and will not go in depth with the mathematical and theoretical foundations. It will rather provide a broad overview of the data science ecosystem and toolbox and enable immediate application.

Preliminary Program:

Monday: Foundations of Data Science and Machine Learning

• An Introduction to Python and Data Science: A brief overview aimed at refreshing or introducing participants to the fundamental Python programming concepts and data science principles. This session sets the stage for more advanced topics by ensuring a common baseline of knowledge.

• Introduction to Machine Learning and Exploratory Techniques: This session will delve into the core concepts of machine learning, covering various exploratory data analysis techniques to uncover patterns and insights from data, essential for any data-driven research.

• Clustering - a world of patterns: Participants will explore clustering algorithms, learning how to identify natural groupings in data. This technique is crucial for pattern recognition and is widely applicable in social science research.


Tuesday: Diving Deeper into Machine Learning

• Introduction to Supervised Machine Learning: Building on the previous day’s foundation, this session focuses on supervised learning models, their applications, and how they can be utilized in humanities and social sciences research.

• Explaining Machine Learning Models: A crucial aspect of machine learning in research is the ability to interpret and explain models. This session aims to equip researchers with techniques to demystify complex models.

• Working with Geospatial Data: An introduction to the integration and analysis of geospatial data within machine learning frameworks, highlighting its importance in sociogeographical modelling.

• Case Example: A practical demonstration of applying supervised machine learning techniques in research, with a focus on register-based studies.


Wednesday: Network Analysis and Visualization

• Introduction to Network Analysis: This session introduces network analysis concepts, emphasizing their applicability in exploring social structures and relationships.

• Curating Networks (TANT-Lab session): Participants will learn about the curation and management of network data, preparing it for analysis and visualization.

-  Visual Network Analysis: Techniques for the visual representation of networks will be explored, enhancing interpretability and insights.

- The Core Principle of VNA: Focuses on the foundational principles of visual network analysis, emphasizing critical evaluation and application.


Thursday: Natural Language Processing (NLP) and Its Applications

• Intro to NLP and String Manipulation: An overview of NLP fundamentals, including text manipulation techniques, setting the groundwork for more advanced NLP applications.

• Supervised ML and NLP: Exploring the intersection of supervised machine learning and NLP, showcasing how these tools can be combined to extract meaning and insights from textual data.

• NLP and Unsupervised ML, Getting Tweets, Semantic Search, SBERT Embeddings: A series of sessions aimed at demonstrating the breadth of NLP applications, from analyzing social media data to implementing semantic search technologies using state-of-the-art embeddings.


Friday: Methodological Outlook and Future Directions

• Introduction to Web Scraping in Python: Participants will learn the techniques for programmatically collecting web data, an essential skill for researchers in the digital age.

• Examples Using APIs and Article Scraping: Practical demonstrations of how to leverage APIs and scrape articles for research purposes, providing a window into the vast potential of web data for social science research.


Teaching methods:

Each day will consist of a mixture of lectures and exercises using interactive online notebooks allowing participants to try out and use the various methods as they are being taught.

Participants are expected to work on a portfolio during the week with each day having hours dedicated to portfolio work with the possibility of sparring with the course lecturers. Here, participants will work on applying the methods and techniques presented on various cases.

Description of paper requirements, if applicable:

The course teaches the methods in python using the Jupyter Notebook IDE on Google Colab.

It is not a prerequisite to know Python beforehand as access to relevant courses will be provided and the first day of the course provides the relevant introduction.

Participants are expected to complete assigned introductory e-courses (e.g. on DataCamp or other selected platforms) before the course. Access to DataCamp will be provided 4 weeks in advance. 

Number of ECTS: