Skip to main content

LIDA Societies x DSDP Seminar Series: Transportation & Environment

Thursday 19 January 2023, 12pm - 1pm

Thursday 19th January – 12noon -1pm (hybrid)

Speakers: Elliot Karikari and Owen Hibbert

LIDA: Societies have teamed up with the Data Scientist Development Programme to bring you a seminar series full of diverse topics within the field of Societies.


Talk 1: Investigating factors influencing route choice from Connected Car Data

Speaker: Elliot Karikari


Understanding the complexities of human behaviour and movement is essential for advancements in various contexts, ranging from public health (Colizza et al. 2007; Lenormand et al. 2015) to official statistics (Marchetti et al. 2015; Pappalardo et al. 2016b).

Recent advancements in transport technology have supplied us with high level spatial and temporal granularity in data (Wang et al., 2021), which potentially enables high level analysis of factors that influence individuals routing behaviour. This allows us to analyse in detail the journeys individuals make, the time they spend on these journeys, their stops and the routes taken. By comparing these routes to other optimal routes, we can further analyse the effects of taking one route over another in order to inform urban planning and infrastructure monitoring.

This project seeks to investigate factors influencing route choices within the United Kingdom. Our dataset is made up of over 400 million GPS points (latitudes and longitudes) which translates to approximately 1,800,000 trips made in “connected cars”. There is a decent representation of trips across the country. It spans the period between 30/06/2022 – 01/08/2022. These recordings are taken at an approximate 3-second interval, demonstrating the precision given by current technology. In addition to the GPS recordings, the dataset provides information on vehicle speed, bearing angle and geographic location (geohash). The exact origins and destination of each trip are obfuscated for privacy and protection purposes.

Due to the level of accuracy within this data, and our objective of understanding route behaviour, our analysis is performed directly on the raw data. We have set out to derive 5 primary statistical indicators to help us analyse the trips made and explore regional variations across UK. These statistics are:

  • Trip distance
  • Trip duration
  • Cumulative angular deviation – The angle of incidence between two adjacent road segments, relative to a straight line.
  • Number of turns by trip – This will inform on pattern regarding types of routes taken, i.e., are people more likely to go along a route without many turns or not.
  • Route sinuosity – The ratio of the route length to the straight-line distance between origin and destination.

Due to the significant size of the dataset (around 600Gb) this process would be undertaken in chunks over the full dataset.

It is expected that the connected car data will further provide richer data on route choice than is currently available through traditional sources such as surveys. Relative to these more traditional sources, it is expected that greater detail can be captured on variation in route choice over different time periods as well as on the external factors that might influence route choice. Such insights can be fed into travel activity models for further analysis and to derive policy-relevant insights.


Talk 2: What can household water consumption data tell us about dwelling occupancy during COVID-19 lockdown periods and ‘staycation summers’

Speaker: Owen Hibbert


Automated metering Infrastructure (AMI) in the residential water supply sector has tremendous potential re-use value as a novel and near real time source of information on dwelling occupancy patterns. This is because water is typically only consumed when householders are present, and its collection is non-intrusive as part of routine billing processes. We draw on data at a one-hour temporal resolution for a subset of up to 2,500 properties in Devon and Cornwall supplied by regional water supplier, South West Water. Our ultimate goal is to assess the potential value of these data as a proxy for area-based indicators of dwelling type and tourism activity (e.g., presence and occupancy of dwellings used as tourist lets).

In this presentation we report on work in progress to establish the link between recorded water consumption and dwelling occupancy. COVID-19 lockdown periods in 2020, and the ‘staycation summers’ of 2020 and 2021 provide a unique test case for us to develop and assess analytic techniques which can be used to extract dwelling occupancy patterns from these data. Specifically, we use clustering and data visualisation techniques to identify groups of dwellings that experienced similar occupancy patterns during the COVID-19 period of homeworking and staycations, demonstrating the ability of these data to differentiate between dwellings based on their occupancy status.