Skip to main content

Data Science Infrastructures

The programme covers the Computing/Computer Science Foundations and the Data Science Infrastructures, respectively. It reflects work in the School of Computing, particularly the Distributed Systems and Services group (Jie Xu, Karim Djemame, Zheng Wang, Evangelos Pournaras), which develops fundamental principles and practical methods for problems that are computationally challenging and require large scale computing resources, with a focus on HPC, Cloud, Edge-based infrastructures. Research ranges from fundamental advances in data science and understanding of computation, through to highly applied research into exascale infrastructure resource management, and profound insight into data through visualisation.

Two people working at computers facing eachotherThe programme is focusing on defining what “Data Science Infrastructures” encompass in the context of LIDA, and who in the University Leeds has research that falls within this area, e.g. LIFD. We plan to focus on the support to build an advanced engineering platform at the confluence of HPC, Cloud and Big Data which will leverage large-scale geographically-distributed resources from existing HPC infrastructure, employ Big Data analytic solutions and augment them with federated Cloud services.  Of particular interest is how to build on best of data management solutions to offer common data services, supporting LIDA, through a geographically distributed, resilient network connecting general purpose data centres and community-specific data repositories.

Seminars are held to present aspects of our research to the community to attract external and internal interest in the research we are doing, and potentially identify more people who fall in this programme in Leeds. Workshops are organised to allow researchers in this programme to make connections, showcase what they are doing and what research directions could benefit different communities. We aim to identify up to three use cases to focus on three main challenge areas that require interdisciplinary working within LIDA research communities:  Societies, Health and Environment. The aim is to bring up ideas that could be shaped into successful grant proposals.

Serverless Computing for Big Data

Serverless computing is an excellent fit for big data processing as it can scale quickly and cheaply to thousands of parallel functions. We leverage virtualisation technologies to allow efficient and on-demand placement of virtual functions on a serverless platform for energy-aware function provisioning and performance in edge computing environments. Our approach provides a virtualisation layer capable of executing a range of serverless big data applications, as well as legacy applications.

Deep Learning Model Training on a Massive Scale

This research in collaboration with Meta (previously Facebook) aims to democratise billion-scale deep learning model training so that data scientists can train large-scale deep learning models with their limited GPU computation resources. The research is funded by Meta and is developing software systems to make it easier for data scientists to exploit GPU parallelism.

On the Exploitation of Multi-GPUs for Data Science

This research is bringing computer scientists, epidemiologists and statisticians to work on a Welcome Trust-funded project to lower the programming barrier for epidemiologists to run large-scale epidemiological models on GPUs. The project is developing software systems to make it easier for epidemiologists and statisticians to exploit GPU parallelism.

Innovative Integrated Tools and Technologies to Protect and Treat Drinking Water from Disinfection By products (DBPs)

H2OforAll is an ambitious Horizon Europe-funded project that aims to assess main Disinfection By products (DBPs) sources through the development of fast cost-effective and accurate sensor monitoring devices and by modelling their spread through drinking water distribution systems. DBPs toxicity and environmental impact will be studied and measures will be proposed to protect drinking water chain. Breakthrough water treatments to remove DBPs or avoid their formation during water disinfection processes will be developed paying attention to their life cycle analysis costs and risks. A Central Knowledge Base with reliable data on the occurrence of DBPs in the EU and their effects will be created to increase awareness and engagement of society and governmental organizations about these drinking water contaminants and favour new policy responses and guidance.