Skip to main content

LIDA: Data Science Infrastructures

Context

The programme is focusing on the support to build an advanced engineering platform at the confluence of HPC, Cloud and Big Data which will leverage large-scale geographically-distributed resources from existing HPC infrastructure, employ Big Data analytic solutions and augment them with federated Cloud services. Of particular interest is how to build on best of data management solutions to offer common data services, supporting LIDA, through a geographically distributed, resilient network connecting general purpose data centres and community-specific data repositories.

People

Two new academic staff have joined the programme:

Dr Arash Bozorgchenani is a Lecturer in Intelligent Networks at the School of Computer Science, University of Leeds. His research spans future networked systems (B5G/6G), cloud and edge computing, wireless communications, and the application of AI to networking problems. He has contributed to several national and European projects, including the GAUChO project on fog computing integration and the H2020 SANCUS project focused on security versus QoS trade-offs in intrusion detection and response systems.

 

Dr Antonio Alberti is an Associate Professor in Software Engineering at the School of Computer Science, University of Leeds. With over 20 years of experience at the National Institute of Telecommunications (Inatel) in Brazil, he brings deep expertise in distributed systems, service-oriented architectures, cloud/edge computing, and future network technologies including 5G/6G. Beyond academia, he founded the Renascidade social entrepreneurship movement and authored the book Novos Renascimentos. His research explores reimagining ICT foundations for greater autonomy, security, and societal impact.

Partnerships

We are partners in the Communications Hub for Empowering Distributed clouD computing Applications and Research (CHEDDAR). CHEDDAR is a major UK research initiative led by Imperial College London, focused on shaping the future of communications and distributed computing systems. It is one of three national Future Telecoms Research Hubs funded by UKRI’s EPSRC and the Department for Science, Innovation and Technology (DSIT).

Achievements

Community events completed:

  • Seminar: Serverless computing for big data. Serverless computing or Function-as-a-Service (FaaS) is a major innovation in cloud computing that offers elastic scalability, fine-grained billing, and infrastructure management abstraction. While it is mostly employed for lightweight and event-driven workloads, its suitability for compute- and data-intensive applications is not certain because of cold-start latency, execution time limitations, and partial resource provisioning. Through a video object detection case study, the feasibility of serverless platforms for big-data processing is evaluated on  Microsoft Azure.
  • Seminar: High-Performance Exascale Support for Data Science. As data grows in scale and complexity across scientific disciplines, the ability to extract timely insights hinges on computational infrastructures that can keep pace. This talk explores the convergence of High-Performance Computing (HPC) and data science at the exascale frontier - where performance, scalability, and adaptability redefine what's possible. We highlight the architectural advances and software frameworks enabling massive parallelism, efficient I/O, and intelligent scheduling to support end-to-end data-intensive workflows. Case studies from climate modelling and real-time analytics illustrate how exascale platforms are accelerating discovery. We also examine persistent challenges, including system heterogeneity, energy efficiency, and the co-design of algorithms with evolving hardware. Ultimately, this talk not outlines exascale computing power, but demonstrates its truly usability for data science at unprecedented scales.
  • Annual workshop: the workshop took place on 26 June 2025, had two guest speakers and was attended by 35 people.

Papers published:

  1. Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective, Jingzhi Gong, Rafail Giavrimis, Paul Brookes, Vardan Voskanyan, Fan Wu, Mari Ashiga, Matthew Truscott, Michail Basios, Leslie Kanthan, Jie Xu, Zheng Wang, 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025) - Industry Showcase Track, 2025.
  2. Accelerating Tensor-train Decomposition on Graph Neural Networks, Shenghao Qiu, Chunwei Xia, Zheng Wang, In 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2025.
  3. Enhancing Deployment-time Predictive Model Robustness for Code Analysis and Optimization, Huanting Wang, Patrick Lenihan, Zheng Wang, The 21st ACM/IEEE International Symposium on Code Generation and Optimization (CGO), 2025. Distinguished paper award.
  4. Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach. Fatemeh Banaie, Karim Djemame, Abdulaziz Alhindi, Vasilios Kelefouras. Future Generation Computer Systems, 2025

QoS-Aware Placement of Interdependent Services in Energy-Harvesting-Enabled Multi-Access Edge Computing. Shuyi Chen, Panagiotis Oikonomou, Zhengchang Hua, {Nikos Tziritas, Karim Djemame, Nan Zhang, Georgios Theodoropoulos. Future Generation Computer Systems, 2025.

Research Case studies

Abstract ligh and data imageServerless Computing for Big Data

Serverless computing is an excellent fit for big data processing as it can scale quickly and cheaply to thousands of parallel functions. We leverage virtualisation technologies to allow efficient and on-demand placement of virtual functions on a serverless platform for energy-aware function provisioning and performance in edge computing environments. Our approach provides a virtualisation layer capable of executing a range of serverless big data applications, as well as legacy applications.

Meta logo

Deep Learning Model Training on a Massive Scale

This research in collaboration with Meta (previously Facebook) aims to democratise billion-scale deep learning model training so that data scientists can train large-scale deep learning models with their limited GPU computation resources. The research is funded by Meta and is developing software systems to make it easier for data scientists to exploit GPU parallelism.

 

Man with face mask on

On the Exploitation of Multi-GPUs for Data Science

This research is bringing computer scientists, epidemiologists and statisticians to work on a Welcome Trust-funded project to lower the programming barrier for epidemiologists to run large-scale epidemiological models on GPUs. The project is developing software systems to make it easier for epidemiologists and statisticians to exploit GPU parallelism.

 

Find out more about LIDA: Data Science Infrastructures Return to the Annual Showcase 2025