Breaking barriers to active travel: Modelling the Impact of Weather and Daylight to Reduce Inequality
This case study explores the relationship between weather conditions and cycling engagement within the Bradford area using Strava data.
Project overview
This case study created novel datasets and analysis examining relationships between active travel, gender and weather in Bradford. Although it sought to model these impacts, evaluate infrastructure improvements and identify potential barriers, challenges in data normalisation persisted, especially in weighting models to combat the inherent bias in Strava data and the sporadic nature of the ground truth data. Despite these challenges, this project has successfully analysed the overall weather and daylight effects on active travel in Bradford, additionally highlighting the need for consistent data collection methods from local and central government. This concise case study concentrates on analysing cycling activity across various Index of Multiple Deprivation (IMD) deciles and weather scenarios, alongside the development and intended application of new datasets.
Data
Strava Metro Data
Strava is a fitness App that tracks an estimated 100 million user’s physical activity across 190 countries. This dataset provides a granular and comprehensive insight into travel activities like cycling and walking. However, it's important to note the inherent biases due to Strava's user base, which predominantly consists of tech-savvy fitness enthusiasts, with a male bias that limits the dataset's representativeness of the general population's active travel habits.
MIDAS Open: UK Land Surface Stations Data
The MIDAS (Met Office Integrated Data Archive System) Open provides weather data acquired from land surface stations and is curated by the Met Office. It constitutes a spatially and temporally comprehensive repository of meteorological observations across the United Kingdom. This dataset encompasses detailed records of rainfall, temperature, humidity, wind speed, and atmospheric pressure measurements, collected from a network of stations throughout the UK. The granularity of the data, often recorded hourly, facilitates in-depth analysis of climate patterns, extreme weather events, and long-term climate change trends within the UK.
Department of Transport Data (DfT)
The DfT Data includes diverse datasets curated by transportation authorities to monitor, analyse, and improve the transportation network's efficiency, safety, and sustainability. Cycling data was parsed from a wider data source in order to understand active travel in Bradford.
Methods
Data cleaning included pre-processing the primary Strava dataset and weather data using Python and QQgis, this involved clipping the data to fit Bradford's geographic boundaries and merging the split data sources into a unified data frame. Integration of weather conditions, precipitation levels, and cycle path availability, alongside the classification of data based on daylight into 'night' and 'day' categories enriched the final dataset. A similar approach was grafted for the DfT datasets.
After cleaning, several attempts were made to ground truth the Strava data for accurate representation, enabling the creation of accurate, toggleable maps. Several machine learning methods were utilised, including random forest, multiple linear regression, seasonal decomposition and Prophet models, with preprocessing steps such as one-hot encoding for categorical variables and standardisation for numerical variables being applied.
The final datasets, derived from Strava and DfT data, cover the years 2018–2022, with sample sizes of n = 4902778 and n = 21360 respectively. The analysis of these final datasets involved evaluating the relationship between specific weather variables— air temperature, precipitation, dew-point, and relative humidity—in addition to active travel counts by gender, segmented by dedicated cycle paths and the Index of Multiple Deprivation (IMD) deciles across the BCC area. The correlations were computed separately for data derived from Strava users and the DfT, aiming to identify potential patterns or disparities in active travel behaviour in response to weather conditions.
Key findings
In the data analysis, air temperature bears a slight positive correlation with active travel among Strava users (Figure 1), suggesting an inclination towards outdoor activities in warmer conditions. Conversely, the DfT dataset reveals varied responses to air temperature (Figure 2), with extreme temperatures deterring active travel among the general population, although only slightly. This disparity highlights a light demographic sensitivity to weather, showing differences in Strava's user base compared to the broader demographic. The strongest - albeit weak- correlation to air temperature in the DfT data set at the 6th IMD decile (0.25), although further geographical investigation would be beneficial to see if season or rurality influences these behaviours.

The heatmap is produced from cycling counts from Strava, Met Office Integrated Data Archive System weather data (2019-2022) and the index of multiple deprivation deciles in Bradford. The heatmap shows overall positive and negative weak correlations with all weather types.
Figure 1. Strava data (n = 4902778) correlation with air temperature, precipitation, dew point, relative humidity, IMD decile and cycling count (2018-2022).

The heatmap is produced from cycling counts from the Department of TransportMet Office Integrated Data Archive System weather data (2019-2022) and the index of multiple deprivation deciles in Bradford. The heatmap indicates that decile 8 has the strongest positive correlation with relative humidity and cycling counts(0.36), while decile 9 has the strongest negative correlation with dew point (-0.80).
Figure 2. DfT data (n = 21360) correlation with air temperature, precipitation, dew point, relative humidity, IMD decile and cycling counts (2018-2022).
Precipitation impacts showed while Strava users exhibit negligible to slightly negative correlations, indicating engagement with active travel despite adverse weather, the DfT data suggests a more pronounced deterrent effect of rainfall on the general population's active travel patterns.
Complimenting the heat-map analysis, the subsequent linear regression of the Strava and DfT datasets reveals consistent patterns (Figure 3 & 4). Air temperature maintains a minor positive correlation with active travel among Strava users, underscoring a slight preference for cycling during warmer weather—which mirrors the heatmap's indications. This suggests consistency in the influence of temperature on active travel behaviour across multiple analytical methods. Meanwhile, the DfT data, serving as a ground truth but with less demographic detail, shows a varied but generally slight deterring effect of extreme temperatures on active travel. Precipitation displays a negligible impact on Strava users' cycling habits, with a slight negative correlation in the DfT data, implying more influence of rainfall on the broader population's travel patterns.

This image shows five scatter plots, each illustrating the correlation between Department for Transport (DfT) cycle counts and key weather variables: air temperature, precipitation amount, dew point, wet bulb temperature, and relative humidity. These plots visualise the influence of various weather conditions on cycling activity.
Figure 3. Scatter plot showing the correlation between DfT cycle counts and key weather variables: air temperature (0.0471), precipitation (-0.0415), dew-point (-0.0593), wet bulb temperature (0.0072), and relative humidity (-0.1076), illustrating how weather conditions influence cycling activity.
Figure 4. Scatter plot correlations between Strava cycling counts and weather variables, showing mild positive correlations with air temperature (0.0516), dewpoint (0.0740), wet bulb temperature (0.0646), and very slight ones with precipitation amount (0.0012) and relative humidity (0.0166), suggesting varying degrees of weather influence on cycling activities.
Season decomposition analysis (figure 5) demonstrates the utility of machine learning on the finalised Strava dataset and possibilities in inferring missing data in active travel where data is incomplete or missing, such as the DfT ground truth data. Further exploration of Strava data with linear regression models revealed strong predictive performance (R² = 0.912), effectively forecasting cycling activity based on environmental and demographic variables with accuracy (MSE = 4.824, RMSE = 2.196). With further refinement, these models could potentially enable researchers to infer cycling behaviour without direct counts of cycling activity. Examples of applications could include creating live digital active travel twins that infer live weather data to predict active travel flows in real-time.
Figure 5. Results from a seasonal decomposition analysis of the hourly cycling counts data. The model decomposes the time series into three components: trend, seasonal, and residual. The trend component reveals the overall long-term pattern in cycling activity, the seasonal component identifies any weekly or monthly cycles, and the residual component shows the remaining variability not explained by the trend or seasonal factors. This predictive model was used to complete a yearly dataset that can inform infrastructure planning and targeted interventions to promote active travel.
Figure 6 illustrates the cycling preferences of men and women on dedicated cycle highways using Strava data, revealing a higher preference for women in using these cycling lanes. Figure 7 explores how daylight hours affected cycling activity among men and women in January 2019, highlighting temporal influences on behaviour. The normalised data shows more women cycling in daylight hours than men. Both of these results match pre-existing literature on women's consideration of safety in active travel.
Figure 6. Cycling preference of men and women in dedicated cycle highways derived from Stava data.
Figure 7. Depicts the impact of daylight hours on cycling activity between men and women in January 2019
Value of the research
The results indicate that adverse weather conditions have little impact on cycling activity within Bradford as a whole; however, this impact may be context-dependent. The individuals who continue to cycle, as reflected in both the Strava and Department for Transport data sets, display a highly consistent cycling behaviour despite variations in weather, hinting at a subset of cyclists who may be undeterred by factors such as rain or temperature—traits commonly associated with cycling enthusiasts or hobbyists. This consistency in cycling patterns suggests a level of commitment to cycling as a mode of transportation or recreation regardless of weather conditions. Therefore, it is important to consider non-cyclists, who, deterred by the prospect of cycling in adverse weather, may opt not to pursue cycling at all which may be due to the outlying costs of a bicycle and weatherproof clothing coupled with other socio-economic factors such as safety and crime. This understanding can inform targeted interventions aimed at encouraging cycling across a broader segment of the population. Overall, the data shows the importance of creating and promoting a cycling environment that is not only weather-resilient but also broadly inviting to a diverse range of potential cyclists, regardless of weather conditions.
The newly developed datasets, encompassing weather and precipitation data with hourly temporal resolution and geographic detail down to OpenStreetMap ways and Lower Layer Super Output Areas (LSOAs,) will allow further research into cycling behaviour. This aggregation of both Strava and DfT data will allow for the analysis of additional variables and infrastructure, going beyond the already incorporated Index of Multiple Deprivation (IMD) and cycling infrastructure, and offers the potential for more nuanced insights into age, ethnicity, and gender creating greater understanding in breaking barriers to active travel.
Finally, the exploration of machine learning to predict cycling behaviours has set a base at which it can be refined, including geographical features, which could significantly enhance the predictive accuracy of cycling behaviour models.
Quote from project partner
“Having a LIDA data scientist spending time working with us and on our data to help us understand our citizens and communities use of active travel and the barriers that can affect their participation was fantastic. The use of data science to analyse such large datasets is new to our organisation and Lydia has demonstrated the insights that can be generated through such innovation.”
Dr Caroline Tait, Consultant in Public Health (Research), City of Bradford Metropolitan District Council.
Insights
- Despite weather variability, cyclists display consistent commitment, indicating that adverse conditions do not deter existing cyclists and may suggest those in the region who do are enthusiasts or hobbyists. This finding suggests that there are barriers to active travel in Bradford and that existing cyclists have already broken these barriers.
- The integration of comprehensive weather data and geographical detailing will enable more depth analysis in cycling patterns, allowing the identification of other barriers to active travel.
- Machine learning models such as random forest, multiple linear regression, and Prophet, show potential of predictive analytics in cycling behaviour, marking another step towards data-driven urban planning and active travel encouragement.
Research theme
- Health
- Societies
- Environment
Programme theme (select all that apply)
- Statistical Data Science
- Data Science Infrastructures
- Environment
People
Supervised by Dr. Francesca Pontin, Senior Research Data Scientist at Consumer Data Research Centre and Dr Callum Smith, PDRA in the School of Earth and Environment, University of Leeds
Dr. Caroline Tait - Consultant in Public Health - Bradford City Council
Emma Young - Cycle and Active Travel Champion – Bradford City Council
Nicholas Metcalfe - Traffic Census and Survey Co-ordinator – Bradford City Council
Consumer Data Research Centre
Partners
Bradford City Council
Funders
Funded by Consumer Data Research Centre