Consumer data can provide insight in to a wide range of human activity, but there is a trade-off between privacy and utility of the data.
Consumer data collected by commercial providers have huge potential for a range of research purposes but can be challenging to access as they are often held in secure environments. Secure handling of these datasets is crucial, as consumer data contains sensitive attributes (e.g., address) or commercially sensitive data (e.g., they have been purchased or contain licenced information). This project provides a proof of concept for creating enhanced and aggregated versions of consumer datasets for research purposes, and a dashboard for exploring those data.
Data and methods
Taking securely held consumer datasets within the Consumer Data Research Centre (CDRC), the objective of the project is to produce non-disclosive and aggregated versions of the data whilst maintaining the unique characteristics and value of those data. An R shiny app visualising the aggregated data has been developed to showcase the utility of non-disclosive datasets for research purposes. Based on a randomised sample of Whenfresh/Zoopla consumer data, key matrices such as median price and affordability are calculated for different property types at the Middle Layer Super Output Areas (MSOA) level. Additionally, open data is used to calculate further metrics, for example, the attractiveness of an area based on Census flow data. The next steps include improving the efficiency, loading and updating times of the R shiny app so that it can be populated with additional datasets.
Using existing data, especially anonymised and aggregated consumer data, this research project can be seen as a proof of concept for an ‘alternative’ or ‘big data’ census. Different data types, e.g. time series, static, and origin-destination flow data, have successfully been combined and can be explored by the user in a dashboard (Figure 1).
Figure 1 Screenshot of GOLIATH dashboard
Value of the research
The prototype R Shiny app forms the basis for further work in providing a dashboard for exploring local area statistics. Moving forward, other consumer data could be included as part of GOLIATH, for example, transport and lifestyle datasets. Utilising consumer data in addition to traditional census counts contributes to efforts to create an ‘alternative’ or ‘big data’ census.
- Devised methods for the aggregation and calculation of metrics for secure consumer data
- Developed a prototype R Shiny App for the visualisation of spatially disaggregated information
Maike Gatzlaff, LIDA Data Scientist Intern
Dr Nik Lomax
Co-Director of the Consumer Data Research Centre
Professor Mark Birkin
Co-Director of the Leeds Institute for Data Analytics
Dr Will James
Research Fellow, University of Leeds
The Consumer Data Research Centre
The data for this research have been provided by the Consumer Data Research Centre, an ESRC Data Investment, under project ID CDRC [Project Number], ES/L011840/1; ES/L011891/1.