Data Science Infrastructure Programme

Data Science Infrastructures Programme

This is a newly programme created in August 2022 to cover the Computing/Computer Science Foundations and the Data Science Infrastructures, respectively. It reflects work in the School of Computing, particularly the Distributed Systems and Services group (Jie Xu, Karim Djemame, Zheng Wang, Evangelos Pournaras), which develops fundamental principles and practical methods for problems that are computationally challenging and require large scale computing resources, with a focus on HPC, Cloud, Edge-based infrastructures. Research ranges from fundamental advances in data science and understanding of computation, through to highly applied research into exascale infrastructure resource management, and profound insight into data through visualisation.

At the initial stage of the programme, we are focusing on defining what “Data Science Infrastructures” encompass in the context of LIDA, and who in the University Leeds has research that falls within this area, e.g. LIFD. We plan to focus on the support to build an advanced engineering platform at the confluence of HPC, Cloud and Big Data which will leverage large-scale geographically-distributed resources from existing HPC infrastructure, employ Big Data analytic solutions and augment them with federated Cloud services.  Of particular interest is how to build on best of data management solutions to offer common data services, supporting LIDA, through a geographically distributed, resilient network connecting general purpose data centres and community-specific data repositories.

Our first activity was a seminar on 7th November 2022, to present aspects of our research to the community to attract external and internal interest in the research we are doing, and potentially identify more people who fall in this programme in Leeds. Our second event is planned to be held in January 2023, with the goal of allowing researchers in this programme to make connections, showcase what they are doing and what research directions could benefit different communities. We aim to identify up to three use cases to focus on three main challenge areas that require interdisciplinary working within LIDA research communities:  Societies, Health and Environment. The aim is to bring up ideas that could be shaped into successful grant proposals.

Algorithmic Support for Massive Scale Distributed Systems

(Natasha Shakhlevich and Jie Xu)

This research addresses the key challenges of the modern digital revolution, characterised by the impressive growth in capacity of computing resource providers and the growth in customers’ usage. The scale of that growth is unprecedented and it gives rise to extremely complex problems of efficient resource management and task scheduling. In this project, specialists in applied and theoretical computer science will join efforts in order to perform a fundamental study of the associated problems and to develop advanced approaches to optimise the performance of resource providers.

The proposed research stems from the theoretical project “Submodular Optimisation Techniques for Scheduling with Controllable Parameters, from a number of applied projects delivered by Prof. Xu and his team, and from the most recent collaborative work of our combined multidisciplinary team.

It is aimed at developing scientifically grounded methods for solving complex optimisation problems that can be used in massive-scale distributed systems. The methods will address a collection of system models, characterised by different levels of abstraction. The outcomes will include an algorithmic toolkit and a methodology that links the application of formally sound algorithms to practical systems.

TANGO (Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation)

(K. Djemame)   GitHub: https://github.com/TANGO-Project

This research paved the ground for the new world emerging from the new exploitation possibilities brought by the utilization of new powerful computing resources offered by customized heterogeneous hardware. It follows the path to simplify the way developers approach the development of next-generation applications in the upcoming era of Mobile, Internet of Things (IoT), Cyber Physical Systems (CPS), Wearables, Big Data and High Performance Computing (HPC).

Because the impact of heterogeneity on all computing tasks is rapidly increasing, innovative architectures, algorithms, and specialized programming environments and tools are needed to efficiently use these new and mixed/diversified parallel architectures. The initiative focus is on “Simplification & Optimization of Heterogeneity” to design more flexible software abstractions and improved system architectures to fully exploit the benefits of these heterogeneous platforms, while addressing energy optimization at the same time.

The research supports controlling and abstracting underlying heterogeneous hardware architectures, configurations and software systems including heterogeneous clusters, chips and programmable logic devices while providing tools (such as FPGA, ASIP, MPSoC, heterogeneous CPU+GPU chips and heterogeneous multi-processor clusters) to optimize various dimensions of software design and operations (energy efficiency, performance, data movement and location, cost, time-criticality, security, dependability on target architectures).

 

Find out more about the Data Science Infrastructures Programme

Return to the 2022 annual showcase main page