Research Technology

From a research technology perspective, LIDA has five strategic aims:
- Provide high end, scalable research computing infrastructure
- Develop processes and tools to improve costing and research recoveries
- Encourage open research practice
- Deliver impactful research and innovation
- Be a trusted collaborator
Significant progress has been made toward them all. The LASER Delivery Group, comprising key staff from LIDA and University IT, was formed in February to provide a forum for regular discussions about computing infrastructure issues and solutions, and currently meets weekly (Aim 1).
Regarding costing and recoveries (Aim 2), months of behind the scenes work with the University Research and Innovation Service has been brought to fruition by establishing and publishing a new process for Faculty Research and Innovation Offices to follow when costing research grants. The good value for money that LASER provides for research with sensitive data is reinforced via regular communications with end users, schools, institutes and faculties.
LIDA has been accepted as an organisation for PyPI, the official third-party Python software repository. That will facilitate open research (Aim 3) by: (a) providing a critical mass that makes it easier to find open source software produced by LIDA researchers, and (b) allowing the effort of maintaining that software to be shared.
The greatest strides forward have been made in Aim 4. The knowledge and skills of LIDA researchers and its core staff, as well as the infrastructure that is available, has been promoted to dozens of external private and public sector organisations at events run by LIDA’s nine application domain and research methods Communities. LIDA’s AI Community is at the heart of new University-wide initiatives in that area. Masterclasses and CPD workshops in visualization, data quality and process mining have attracted large audiences and are leading-edge methods to researchers.
Finally, the operational delivery of the LASER platform for sensitive data research (ISO27001 and NHS DSPT accredited) is now business as usual (Aim 5), underpinned by close technical and information governance collaboration between LIDA and University IT. The LASER platform currently supports 186 LIDA researchers working on 78 different projects.
By Prof Roy Ruddle
LIDA Director of Research Technology
Data Analytics Team
Another successful year for the DAT!
Alongside the continued operation of the University’s Trusted Research Environment (LASER), the team have taken on a great deal of additional bespoke research collaborations, co-authored papers, been interviewed by international magazines, had two babies & a wedding and presented at local & national conferences; all while working tirelessly to streamline and automate our internal processes.
This year has seen the Data Analytics Team grow not just from strength to strength, but also in number. Two of our count achieved internal promotion and we’ve recruited three new members (not including the newborns).
The DAT are now a team of six, up from the four of us last year, reflecting not just the increased demand for the skills and expertise we bring but also our ability to deliver to a high standard. This high standard has been delivered to academic teams in part by embedding members of the DAT, allowing them to contribute directly to the research as collaborators.

We've also given presentations at various conferences including UoL's 'AI & GPU Frontiers', HDRUK's 'Health Data Science Black Internship Programme' and a number of TRE conferences.
As the University of Leeds’ Trusted Research Environment, LASER has seen continued growth in it’s use and capacity to support world class research. The DAT have been working ever more closely with IT Services to maintain the high standards necessary for compliance with the external data security frameworks required of the research using such sensitive data, and to build a road map for the ongoing development of the platform that ensures it’s continued improvement and capability.

By Adam Keeley
Data Analytics Team Manager
Recent Team Activity
Automating Large-Scale Pseudonymisation for HASP’s AirDNA Dataset

One of my deliverables focused on improving the handling and pseudonymisation of HASP’s AirDNA dataset, which provides detailed data on Airbnb listings and activity across the UK. This dataset, while publicly available, contains identifiers that can potentially link to individual hosts and properties, requiring careful data management to ensure compliance with privacy and ethical standards.
I reviewed the existing pseudonymisation workflow and developed a refined Python script to automate the process, improving both performance and reliability when handling large-scale monthly data updates. The updated script enhances processing speed, reduces manual intervention, and ensures consistent and secure outputs across all dataset types (Host, Property, Daily, Monthly, and Review).

- Outside of the day job I wrote a blog - Wrangling Chaos: 6 Things I Wish I Knew Before Tackling Messy Data
- Nature followed up on the above LIDA blog and I was interviewed by a US-based journalist for Six things to do before jumping on a spreadsheet.
- I also co-authored the Nature Food publication, as part of the HFSS project - Did high in fat, sugar, and salt (HFSS) product placement legislation in England lead to reduced HFSS purchases? An interrupted time series analysis.
By Obosekokhune Eselebor
Building a Smarter Way to Budget: The LASER Costing Tool
My current project focuses on developing a bespoke costing tool for LASER resources. The tool integrates directly with Azure APIs to retrieve detailed information about each resource used in a LASER project. By leveraging these real-time data, it calculates accurate costs based on project-specific requirements.
The final product will be a user-friendly web application where researchers can input parameters such as virtual machine type, quantity, project duration, and archive period. The application will then generate comprehensive cost estimates, including monthly and annual totals, as well as a detailed breakdown for each resource.

This solution aims to streamline budgeting, improve transparency, and support informed decision-making for the projects hosted on LASER.
By Fojan Ilderem
Using Natural Language Processing to Measure Curriculum Redefined’s Impact
Curriculum Redefined is a project designed to enhance student experience through the evaluation of a comprehensive Curriculum and Portfolio Enhancement project.
My role in the research team has been to facilitate access to relevant data in a way that is accessible, reproducible and compliant with data standards and legislation as well as provide methods and bespoke software in evaluation the curriculum using NLP.
This year I have been working on a stream lit app which examines the lexical evolution of module descriptions and syllabi from 2017 - 2024 to understand if the enhancement project is reflected in the language used, and how this can be used as a proxy to understand the impact of curriculum enhancement on different demographics and student outcomes. Watch Lydia's project video
By Lydia Wharton
Developing Predictive Models for Psoriatic Arthritis Using UK Biobank Data
I am currently embedded in Prof. Ann Morgan’s team, focusing on vasculitis and giant cell arteritis, etc, with a special emphasis on psoriatic arthritis through the PreDiCT SpA study. This research uses data from the UK Biobank, a unique biomedical database with over 500,000 participants, accessed and analysed on the UK Biobank Research Analysis Platform. My role involves applying data science and research software engineering expertise to help develop an original model for predicting psoriatic arthritis using novel methods.
In addition, I contribute to a second project with the same team above in collaboration with researchers from the University of Manchester, investigating frailty in vasculitis using the CPRD dataset on the LASER platform.
Prior to these, I was part of Prof. Michelle Morris’s DIO Food project, which analysed large-scale data from four major UK retailers; ASDA, Morrisons, Tesco and Sainsbury’s covering over 11.6 billion items purchased over 30 months. The study assessed the impact of HFSS (high in fat, sugar, and salt) legislation in England. The findings which showed a 2 million fewer HFSS items sold daily were presented to Parliament, and I was a co-author on the draft article, which is in line for publication.
By Ifeanyi Chukwu
Deploying a Machine Learning API for Atrial Fibrillation Risk Prediction
Over the past year, I have developed and containerized the FIND-AFDAS prediction model as a fully functional REST API using the R/Plumber framework. The API was designed to make machine learning–based predictions accessible through a simple web interface. It features two primary endpoints: a connection-status endpoint to verify service availability, and a predict endpoint that accepts key patient clinical variables (age, sex, heart failure, hypertension, valvular heart disease, ischaemic heart disease, and chronic kidney disease). The model processes these inputs to generate a final prediction, indicating whether a patient is at elevated risk of atrial fibrillation following ischaemic stroke.
The API has been thoroughly tested using Swagger UI, confirming full functionality and accurate prediction outputs. The results are returned in both binary (0/1) and human-readable (“negative”/“positive”) formats.
To enhance portability and ensure consistent performance across environments, the API was containerized using Docker, including a custom Dockerfile defining dependencies and startup processes. This allows seamless deployment to various platforms, laying the foundation for future cloud-based or on-premises integration of the FIND-AFDAS API.
By Ameena Farooq Valiya

