PhD Courses in Denmark

Introduction to digital humanities and to the digital treatment of language data

PhD School at the Faculty of Humanities at University of Copenhagen

Dates and time: 28 - 31 May 2024 from 9:00 to 16:00

Are you passionate about exploring the intersection of digital technologies and the humanities? Join our PhD course ‘An Introduction to Digital Humanities and to the digital treatment of language data’ designed specifically for PhD students interested in applying digital methods to their humanities language data, and to thereby be able to answer new kinds of research questions on large amounts of such data. In this course, we will equip you with a basic understanding of various methods, standards, and tools essential for conducting digital humanities research on language data.

Our focus of the course will be on the digital processing of text and speech, exemplified among others through the extensive digital corpora available through The Department of Nordic Studies and Linguistics. You will have hands-on experience with the corpus tool Korp, enabling you to unlock valuable insights hidden within vast amounts of textual data. Additionally, we will look into natural language processing techniques, providing you with a practical introduction to standard tools like NLTK (Natural Language Tool Kit), Spacy, and machine learning with Python. You will get the opportunity to annotate and work with these tools also on your own data.

As part of the course, you will also be introduced to CLARIN, a digital platform renowned for its research infrastructure in linguistic data. Moreover, we recognize the importance of text standards and FAIR (Findable, Accessible, Interoperable, and Reusable) data in today's scholarly landscape. The course will introduce you to these concepts, equipping you with the knowledge to ensure your research adheres to the standards of data integrity and accessibility.

Academic Aim:
 - To introduce and discuss the concept of digital humanities and what it means wrt research questions and methodology.
 - Introduce relevant digital methods and tools for the treatment of language data.
 - Apply a relevant selection of methods and tools on the students’ own data.

Target group:
Early and late stage PhD students who work with or are interested in employing empirical methods in digital language data.

Course lecturers: 
Bolette Sandford Pedersen, Professor, Department of Nordic Studies and Linguistics, University of Copenhagen.
Manex Aguirrezabal Zabaleta, Associate Professor, Department of Nordic Studies and Linguistics, University of Copenhagen.
Ali Mohammed Ali Al-Laith, Postdoc, Department of Nordic Studies and Linguistics, University of Copenhagen.
Philip Diderichsen, Special Consultant, Department of Nordic Studies and Linguistics, University of Copenhagen.
Costanza Navarretta,Senior researcher, Department of Nordic Studies and Linguistics, University of Copenhagen.

Course organisers: 
Bolette Sandford Pedersen, Professor, Department of Nordic Studies and Linguistics, University of Copenhagen.
Manex Aguirrezabal Zabaleta, Associate Professor, Department of Nordic Studies and Linguistics, University of Copenhagen.
 

Programme:

Day 1: part one: Introduction to the course, methods and research questions In Digital Humanities, annotation of data (Bolette Pedersen), part two: Introduction to and hands-on with digital handling of text and speech starting from the corpus tool Korp (Philip Diderichsen).

Day 2: Introduction to and hands-on with NLP, data preparation, data analysis, visualisation, and machine learning with Python, NLTK and Spacy. (Manex Aguirrezabal and Ali Al-Laith).

Day 3: Presentation by the students of their own project data and their progress wrt processing of them. Continue with hands-on with NLP, data analysis and machine learning with Python, NLTK and Spacy on students’ data. (Manex Aguirrezabal and Ali Al-Laith).

Day 4: Introduction to and hands-on exercises with the digital platform CLARIN and FAIR data (Costanza Navarretta). Introduction to relevant language resources for NLP (Bolette). Wrapping up and evaluation (all).

After the course: A two to three pages report from the students (to be approved) showing how they can apply/have applied the methods and tools from the course on their own data. The report must be submitted via email to phd@hrsc.ku.dk no later than 14 June 2024.
 

Language: English


ECTS: 4 ECTS for participation.


Max. numbers of participants: 25


Written preparation: Participants must submit 1/2 page abstract about their project and the digital language data which they apply. Please submit the abstract by email to phd@hrsc.ku.dk no later than 15 February 2024.


Registration: Please register via the link in the box no later than 26 January 2024.


Further information: For more information about the PhD course, please contact the PhD Administration (phd@hrsc.ku.dk).


Tentative literature:

 - To be selected in 2024: A primer on Digital Humanities

 - Basics in NLP: Dan Jurafsky and James H. Martin, (2023) Speech and Language Processing (3rd edition draft) (selected chapters)

 - Basics on the corpus tool Korp: Lars Borin, Markus Forsberg and Johan Roxendal (2012). Korp – the corpus infrastructure of Språkbanken. Proceedings of LREC 2012. Istanbul: ELRA, pages 474–478.

 - Example of a DH project at The Department of Nordic Studies and Linguistics using Korp and NLP: Ali Al-laith, Kirstine Nielsen Degn, Alexander Conroy, Bolette S. Pedersen, Jens Bjerring-Hansen & Daniel Hershcovich (2023). Sentiment Classification of Historical Danish and Norwegian Literary Texts. Nodalida Conference 2023, Tórshavn, The Faroe Islands.

 - Selected chapters on the CLARIN Infrastructure:  CLARIN The Infrastructure for Language Resources. Edited by: Darja Fišer and Andreas Witt Volume 1 in the series Digital Linguistics.