Thursday 19th January – 12noon -1pm (hybrid)
Speakers: Elliot Karikari and Owen Hibbert
LIDA: Societies have teamed up with the Data Scientist Development Programme to bring you a seminar series full of diverse topics within the field of Societies.
Talk 1: Investigating factors influencing route choice from Connected Car Data
Speaker: Elliot Karikari
Understanding the complexities of human behaviour and movement is essential for advancements in various contexts, ranging from public health (Colizza et al. 2007; Lenormand et al. 2015) to official statistics (Marchetti et al. 2015; Pappalardo et al. 2016b).
Recent advancements in transport technology have supplied us with high level spatial and temporal granularity in data (Wang et al., 2021), which potentially enables high level analysis of factors that influence individuals routing behaviour. This allows us to analyse in detail the journeys individuals make, the time they spend on these journeys, their stops and the routes taken. By comparing these routes to other optimal routes, we can further analyse the effects of taking one route over another in order to inform urban planning and infrastructure monitoring.
This project seeks to investigate factors influencing route choices within the United Kingdom. Our dataset is made up of over 400 million GPS points (latitudes and longitudes) which translates to approximately 1,800,000 trips made in “connected cars”. There is a decent representation of trips across the country. It spans the period between 30/06/2022 – 01/08/2022. These recordings are taken at an approximate 3-second interval, demonstrating the precision given by current technology. In addition to the GPS recordings, the dataset provides information on vehicle speed, bearing angle and geographic location (geohash). The exact origins and destination of each trip are obfuscated for privacy and protection purposes.
Due to the level of accuracy within this data, and our objective of understanding route behaviour, our analysis is performed directly on the raw data. We have set out to derive 5 primary statistical indicators to help us analyse the trips made and explore regional variations across UK. These statistics are:
Due to the significant size of the dataset (around 600Gb) this process would be undertaken in chunks over the full dataset.
It is expected that the connected car data will further provide richer data on route choice than is currently available through traditional sources such as surveys. Relative to these more traditional sources, it is expected that greater detail can be captured on variation in route choice over different time periods as well as on the external factors that might influence route choice. Such insights can be fed into travel activity models for further analysis and to derive policy-relevant insights.
Talk 2: What can household water consumption data tell us about dwelling occupancy during COVID-19 lockdown periods and ‘staycation summers’
Speaker: Owen Hibbert
Automated metering Infrastructure (AMI) in the residential water supply sector has tremendous potential re-use value as a novel and near real time source of information on dwelling occupancy patterns. This is because water is typically only consumed when householders are present, and its collection is non-intrusive as part of routine billing processes. We draw on data at a one-hour temporal resolution for a subset of up to 2,500 properties in Devon and Cornwall supplied by regional water supplier, South West Water. Our ultimate goal is to assess the potential value of these data as a proxy for area-based indicators of dwelling type and tourism activity (e.g., presence and occupancy of dwellings used as tourist lets).
In this presentation we report on work in progress to establish the link between recorded water consumption and dwelling occupancy. COVID-19 lockdown periods in 2020, and the ‘staycation summers’ of 2020 and 2021 provide a unique test case for us to develop and assess analytic techniques which can be used to extract dwelling occupancy patterns from these data. Specifically, we use clustering and data visualisation techniques to identify groups of dwellings that experienced similar occupancy patterns during the COVID-19 period of homeworking and staycations, demonstrating the ability of these data to differentiate between dwellings based on their occupancy status.