Causal inference in real-world data is the pinnacle aim of applied data science, unfortunately, it is also the most challenging. New graphic methods – such as causal directed acyclic graphs (DAGs) – promise a revolution in the estimation of causal effects but are relatively unrecognised and untested in applied health and social science research.

Graphical approaches are particularly innovative for their transparency and their ability to reveal various common yet poorly understood pitfalls of observational data analytics.

This research programme focusses on understanding, translating, and applying these new methods to improve understanding improved prediction and causal effect estimation in real-world data and extend insights of the pitfalls of observational data analytics in applied health and social science research.

Project aims

  • To gain insight and understanding of common but complex data analytical problems in real-world data
  • To increase use and understanding of causal inference methods in health and social science research
  • To build smarter and more transparent predictive models by blending causal inference methods with data-intensive methods, such as artificial intelligence and individual-based models
  • To bring causal inference methods to real-world health and social science research for improved insight.
  • To train the next generation of data scientists in causal inference methodsĀ 

Explaining the science

Causal inference concerns the identification, estimation, and interpretation of causal effects.

Outside of randomised experimental studies, distinguishing correlation and causality is notoriously difficult because correlations can be generated by several non-causal processes that cannot be classified from data patterns alone.

Causal inference requires the blending of external theory about underlying data generating mechanisms with empirical observations made of the real world. Causal inference methods, such as Pearl’s Structural Causal Framework, provide a formal means to identify and declare these theoretical assumptions. They provide a substantial advance in the transparency of real-world data analytics and understanding a variety of common but poorly-recognised analytical pitfalls.


Causal inference methods are relevant to all quantitative social science research where the researcher seeks to understand the processes involved and/or seeks to estimate change.


Mark S Gilthorpe (Group Lead), Peter WG Tennant (Co-Lead), Kellyn F Arnold, Laurie Berrie, Mark De Kamps, George TH Ellison, Wendy Harrison, John Mbotwa, Johannes Textor

Funders / Partners

The Alan Turing Institute, ESRC, MRC, NHS Digital