The CoSMoNorth project was introduced to build and calibrate a baseline urban simulation (or ‘digital twin’) of mobility and activity of the Northern Powerhouse region for further model development (i.e., new transport services), exploration and application of social appraisal measures, and assessment of policy/infrastructure changes.

Project Overview

The CDRC-funded Comprehensive Spatial Modelling for the North (‘CoSMoNorth’) is an ongoing research project in collaboration with external partner Arup. The first stage of the research discussed in this case study commenced on the 4th of Oct 2021 and reached the end date 4th of April 2022. The expected outcome of this first phase is a consolidated baseline agent-based model framework of the Northern Powerhouse area (more accurately; the Yorkshire and Humber, North West and North East regions or 11 Local Enterprise Partnership areas, including the cities of Hull, Manchester, Liverpool, Leeds, Sheffield and Newcastle) using the MATSim framework with the guidance, tooling and access to training and resources all provided by the research team at City Modelling Lab, Arup.

Collaboration with Newcastle University and application of their work on synthetic population development supported the overall model build as well.

The second phase expects completion of any remaining work of the first phase, further development and application of the model for the implementation of new mobility services and assessment of these interventions through standard experimental procedures. This is done such that multi-urban system interactions, beyond conventional transport appraisal measures (e.g., travel times, generalised costs) are addressed. This also includes studying the impacts on socioeconomic inequalities, emissions, use of active modes, distances travelled, etc.

Data and methods

Data: The data sources for the network generation of the model to be obtained from the underpinned work by Arup on OSM and GTFS applied to the whole of England and Northern Wales. The population is synthesized using:

  1. Regional projected population (persons and households only) from SPENSER (Synthetic Population Estimation and Scenario Projection Model) for the year 2019 from the RAMP
  2. And datasets for the SPENSER population attributes refinement scripts developed by Newcastle University. These include i.e., NTS statistics, ONS, Census, annual population survey etc. for the years 2011 and 2019.
  3. Datasets from National Travel Survey (NTS) at Local Authority Level from 2002 – 2020 (Special Licence Access). These are the main data source for the population synthesis of the model. A special focus is given to the years between 2011 and 2019 and the whole of England area with the exception of London for the extraction of trips for the SPENSER population.


Methods: The method for the development of the urban simulation used here is agent-based modelling. The MATSim framework is used to support implementation. Due to the complexity of MATSim, the model is constructed in stages and in each stage, other tools and scripts are used in place of MATSim. A simplified structure of the stages and their relevant tools are

  1. Network Generation – osmium, PUMA (Arup), GENET (Arup), BitSim (Arup)
  2. Population Synthesis – osmox (Arup), PAM (Arup), synthetic_population_dev (Newcastle University)
  3. Simulation – Elara (Arup)


  1. Each of these stages include validation.
  2. In the above list of tools, the developer is included in the brackets.

More details on the tools such as their general applications and or how they apply to the construction of the model are explained as follows.

  • osmium – A multipurpose command line tool based on the Osmium Library. Some osmium functionalities such as filtering OSM tags, extracting focused geographical boundary areas, merging OSM files etc., are applied for the network generation pre-processing in this project.
  • PUMA – A closed source tool which has Python scripts that uses OSM and GTFS to generate MATSim input files (network.xml, schedule.xml, vehicles.xml) to represent a multimodal network.
  • GENET –A public tool used to manipulate MATSim networks via a Python API. GENET is applied at the post-processing stage of this network generation for network simplification and for reading GTFS data needed for network generation with PUMA.
  • BitSim – An internal package for submitting AWS Batch Jobs/Environments. BitSim is applied in this project for the purpose of running the large computational power-intensive PUMA job in the cloud.
  • osmox – An open-source tool used here to extract locations from OpenStreetMap (OSM) data needed for facility sampling in the Population Synthesis stage.
  • PAM – This is available to the public and is used to generate and modify transport demand scenarios/activity plans via a Python API. This is the main used to create a MATSim-supported population outputs using NTS and refined SPENSER datasets.
  • synthetic_population_dev – A set of twelve python scripts complete with a guide developed by Newcastle University for the attribute refinement of the SPENSER population per region.
  • ELARA – An internal command line utility for processing MATSim events output files. This tool is to be used for the simulation stage which is yet to be run.

Key findings

The results are several folds. The network generation is complete with validation, and it is available for different areas in and surrounding the Northern Powerhouse region. The versions include networks for the areas: the entire Northern Powerhouse region, North of England (Northern Powerhouse without Northern Wales), and North of England (Northern Powerhouse without Northern Wales) with UK urban centres/railway stations.

For the population synthesis, separate SPENSER populations for the regions of North England have been partly extracted and refined. A Python script is written to convert the NTS population to XML inputs to MATSim via PAM for the northern powerhouse region and is modified to take in the SPENSER population. A mini guide has also been written detailing the process workflow of the project. A by-product discovery from the population synthesis stage was the specific method in the application of functions to improve the speed in the data cleaning code sections.

Value of the research

As a far-sighted outlook on impact, for active urban analytics work carried out at LIDA, a consolidated modelling framework for spatial simulation is a necessity. Many researchers and PhD students must start from the ground up with the construction of a baseline travel model in the process of addressing their funded research. An established spatial simulation model can be used

  1. As a teaching tool for MSc. programs,
  2. To speed up the development of new models under tested assumptions and limitations
  3. To provide a calibrated platform for model validation and docking, and
  4. To improve routes to impact.


Simulation data and generation of scenarios from this baseline platform can furthermore aid in responding to emerging policy scenarios and questions (e.g., the impact of proposed policies and infrastructure, including the ‘levelling up’ agenda), and testing new protocols in case of future emergencies (e.g., public health, environmental catastrophe). CoSMoNorth would allow LIDA an opportunity to extend its contributions to the current policy debate, both in the public discourse and in direct collaboration with policymakers and thereby support in influencing local government decisions.

As a review of the research at the six-month mark, it can conclude while the model development is ongoing, excellent progress has been made in a short period of time, and has led to collaborations with Newcastle University and revision of the development process of the model.

Research theme

Urban analytics


Professor Ed Manley, Professor of Urban Analytics, University of Leeds

Dr Nik Lomax, Associate Professor, University of Leeds

Dr Charisma Choudhury, Associate Professor, University of Leeds

Dr Gerard Casey, Senior Consultant, City Modelling Lab, Arup

David Alvarez Castro, PhD Researcher, Newcastle University

Theodore Chatziioannou, Senior Data Scientist, City Modelling Lab, Arup

Fred Shone, Senior Data Scientist, City Modelling Lab, Arup

Kasia Kozlowska, Data Scientist, City Modelling Lab, Arup

Anastasia Kopytina, Data Scientist, City Modelling Lab, Arup

Indumini Ranatunga, Data Scientist, LIDA


Arup, Consumer Data Research Centre, The Alan Turing Institute


This work was supported by The Alan Turing Institute with EPSRC funding on behalf of the UKRI Strategic Priorities Fund Wave 1, grant reference EP/T001569/1.