Data Assimilation on probabilistic Agent-Based Models

Date: Monday 21 October 2019

Benjamin Isaac Wilson, Prof Nick Malleson, Dr Jon Wardand Dr Minh Le Kieu - University of Leeds

Exploring the application of modern probabilistic programming languages on agent tracking, prediction and calibration.

Civil emergencies such as flooding, terrorist attacks, fire, etc., can have devastating impacts on people, infrastructure, and economies. Knowing how to best respond to an emergency can be extremely difficult because building a clear picture of the emerging situation is challenging with the limited data and modelling capabilities that are available. Agent-based modelling (ABM) is a field that excels in its ability to simulate human systems and has, therefore, become a popular tool for simulating disasters and for modelling strategies that are aimed at mitigating developing problems. However, the field suffers from a serious drawback: models are not able to incorporate up-to-date data (e.g. social media; mobile telephone use; public transport records). Instead, they are initialised with historical data and therefore their forecasts diverge rapidly from reality. To address this major shortcoming, this project will aim to develop dynamic data assimilation methods for use in ABMs. These techniques have already revolutionised weather forecasts and could offer the same advantages for ABMs of social systems. There are serious technical and methodological barriers that must be overcome, but this research has the potential to produce a step-change in the ability of models to create accurate short-term forecasts of social systems.

Project Aims:

To understand the utility of modern probabilistic programming languages (PPLs).
Apply PPLs to model a simple singular agent.
Perform data assimilation to calibrate and optimise inference on our agent’s behaviour.

Using Probabilistic Programming
The specific PPL used in this was Pyro which is built upon torch and developed by Uber. Probabilistic Programming is a field of programming that offers a way to embed randomness and uncertainty into programs. This is done using primitives in the form of probability distributions (random variables) that are provided by the PPL - effectively building stochastic functions which can be sampled using a variety of inference algorithms typically provided by the PPLs e.g. Monte Carlo; Importance etc. By doing this, the program is run to collect samples of output from the stochastic program, building a distribution of possible states – in this case for a singular agent’s location in some environment. This uncertainty on the agent's locations can then be calibrated assimilating some (noisy) observation i.e. sensor data such as footfall, cameras, GPS etc. This helps prevent the ABM diverging from reality.

In this project, the observation data is synthesised and applied to a simple agent in a simple environment. The goal for the agent is to step towards one of two exits but we don’t know which one. As we run the model, we can observe some pseudo truth data and rethink the model's hypothesis of the agents function i.e. if we see that the observation suggests the agent is moving to exit 1 then we update a random Bernoulli to represent this. See Figure 1 and 2.

Sample of Outputs
Figure 1: Showing the agents potential locations in a simple environment with two exits.

Figure 2: Agent recalibrated using an observation suggesting the agent moving to the top exit.
Insights
This project has revealed the potential for the application of PPLs on ABMs. Once a developer or researcher has overcome a theoretical learning curve, PPLs prove to be powerful tools. They provide a convenient way to write programs with uncertainty and randomness in mind. This attempt has successfully assimilated data and gives a base to which researchers and developers can continue to innovate.

Further Work
These models should eventually evolve into digital twins from more complex urban environments; however, this will pose challenges in computation. The resource requirements for these models need to be further investigated in order to make the correct decisions for scaling up. Currently, we perform calculations on only a single agent, this is already a taxing computation and ultimately the ABM would be running with thousands of agents. A shift towards low-level languages, experimenting with high-performance computing and alternative programming paradigms (most likely Data-Oriented Programming).

Research theme
Forecasting social systems

This project was undertaken as part of the LIDA Data Scientist Internship Programme.