Distributed Data Processing with Dataflow Systems (2025)
The Technical Doctoral School of IT and Design at Aalborg University
Description: In today’s world, data is at the heart of decision-making processes across various domains. Dataflow is a programming paradigm and execution model that underpins many modern distributed data processing systems. In this model, developers create programs by defining sequences of functional transformations on input data. The system runtime then manages the execution of these programs across distributed computing infrastructures, abstracting away complexities related to development, distribution, communication, and fault tolerance.
This course delves into the fundamental concepts of dataflow systems, covering both programming models and implementation details. Starting with basic constructs for analyzing static and streaming data, the course progresses to more advanced topics such as iterations, time-based computations, and user-defined functions. We will explore and compare different approaches to implementing these constructs, highlighting their respective advantages and disadvantages.
Throughout the course, students will engage with examples from modern dataflow systems and participate in hands-on sessions to complement the theoretical notions.
Prerequisites: Familiarity with Java
Learning objectives:
On successful completion of this course, students will be expected to be able to:
1. Gain a comprehensive understanding of the dataflow paradigm, its significance in distributed data processing systems and the use cases where it can be used.
2. Design and implement dataflow programs that efficiently process large volumes of data in real-time. Master both basic constructs for static and streaming data analysis and advanced topics such as iterations, time-based computations, and user-defined functions.
3. Evaluate dataflow systems, understand the various performance metrics, design and execute sound experiments.
4. Compare the existing dataflow frameworks, understanding the relative advantages and disadvantages.
Organizer: Daniele Dell'Aglio
Lecturers: Alessandro Margara, Politecnico di Milano
ECTS: 2.0
Time: 9 - 10 June 2025
Place: Aalborg University
Zip code: 9220
City: Aalborg
Maximal number of participants: 25
Deadline: 19 May 2025