The project has been funded under the EPSRC Making Sense of Data call, which is part of the Towards an Intelligent Information Infrastructure (TI3) cross-ICT priority. The QuantiCode project aims to develop novel data mining and visualization tools and techniques, which will transform people’s ability to analyse quantitative and coded longitudinal data. Such data are common in many sectors. For example, health data is classified using a hierarchy of hundreds of thousands of Read Codes (a thesaurus of clinical terms), with analysts needing to provide business intelligence for clinical commissioning decisions, and researchers tacking challenges such modelling disease risk stratification. Retailers such as Sainsbury’s sell 50,000+ types of products, and want to combine data from purchasing, demographic and other sources to understand behavioural phenomena such as the convenience culture, to guide investment and reduce waste.
We aim to deliver an infrastructure that provides far more powerful analytical tools than those are available today for public and private sector organizations to transform their abilities to analyze quantitative and coded longitudinal data. Our goals include:
- To understand the workflows and to address the barriers of knowledge extraction from data in private and public sectors.
- Thought leadership in data governance.
- Efficient heterogeneous data fusion.
- Robust and scalable data mining/machine learning tools for data analysis.
- Data visualization/mining of abstraction models.
We will deliver a step-change in the ease with which analysts can integrate heterogeneous data.
Frameworks and guildlines for best trade-off between the use of the data and avoiding ethical issues.
Efficient hierarchical data linkage and capability to handle inexact data mappings and longitudinal changes in hierarchies/mappings.
Methods for users to investigate the quality of linked data.
We aim to deliver scalable and adaptive methods for processing/mining heterogeneous longitudinal data, and novel visualization interfaces for effective analysis.
Mining longitudinal data
Novel methods to robustly model longitudinal data by considering observational noise and new temporal relations.
Scalability via stochastic control
Using stochastic control techniques to build adaptive models that balance accuracy and computational cost automatically.
Question-posing visual interface
To create low-effort question-posing visual interface that allow users to pose questions in their own terms, and then return time-constraint optimal results.
Abstraction models are built with ethical and time constraints for visualization and data analytics.
A tool that implements the governance principles for linkage and data granularity, to create a system that is ethical-by-design to automatically detect suspicious behaviour and reduce the risks of breaching privacy.
Computational and visual techniques for abstraction model building
To build robust abstraction models that condense data, reduce the number of variables, and simplify analysis while preserving important details. High resolution display is expected for the visualization over various granularity.
To automatically build models abstractly defined by intuitive examples provided by users.
Dr Roy Ruddle (PI; School of Computing), Prof Mark Birkin (School of Geography), Dr Jan Palczewski & Dr Georgios Aivaliotis (School of Maths), Prof Sir Alex Markham (Leeds Institute of Biomedical and Clinical Sciences), Prof Justin Keen (Leeds Institute of Health Sciences), and Prof Chris Megone and Dr Kevin Macnish (Inter-Disciplinary Ethics Applied Centre).
The QuantiCode project is funded by the EPSRC (EP/N013980/1)
and supported by the MRC (ES/L011891/1) and the ESRC (ES/L011891/1).
For any enquiries, please contact: Prof. Roy Ruddle (PI), firstname.lastname@example.org