Skip to main content

September School - Causal inference with observational data: the challenges and pitfalls

Training Event
Monday 9 - Friday 13 September, 2019, 9am - 5pm
Leeds Institute for Data Analytics, Level 11, Worsley Building, University of Leeds, Clarendon Way, Leeds, LS2 9NL
LIDA, 0113 343 9680,

This five-day School, run in collaboration with The Alan Turing Institute, offers state-of-the-art training in the analysis of observational data for causal inference. By exploring the philosophy and utility of directed acyclic graphs (DAGs), participants will learn to recognise and avoid a range of common pitfalls in the analysis of complex causal relationships, including the longitudinal analyses of change, mediation, nonlinearity and statistical interaction.

The school is run by Prof Mark S Gilthorpe (Leeds Institute for Data Analytics, LIDA, & School of Medicine) and Dr Peter WG Tennant (LIDA, & School of Medicine) - both Fellows of the Alan Turing Institute for Data Science and Artificial Intelligence - with input from Dr George TH Ellison (LIDA, School of Medicine), and drawing on tools and materials prepared with Dr Johannes Textor (Radboud University Medical Center, Nijmegen).

Through a mix of lectures, discussions, and interactive workshops - blending theory with real-world examples - the School aims to provide an essential introduction to the analysis of 'big data'. Although the examples are primarily taken from health and medical literature, the topics are relevant to any discipline where non-experimental data is routinely analysed for causal inference. We therefore welcome researchers from fields across the quantitative social sciences. Please get in touch if you have any questions about the suitability of the course.


The School will cover the following subjects:

  • Distinguishing prediction and causal inference
  • Counterfactuals and potentials outcomes
  • Natural experiment approaches (including instrumental variables)
  • Causal directed acyclic graphs (DAGs): theory and practice (including our guide to drawing DAGs in applied social science research)
  • The role and relevance of covariates in multiple regression
  • Collider bias and differential selection bias (including 'reversal paradox' and the Table 2 Fallacy)
  • Conditioning-on-the-outcome and regression-to-the-mean;
  • Deterministic variable bias (including mathematical coupling, composite variable bias, and compositional data)
  • A causal interpretation of statistical interaction and joint effects.
  • Time-varying exposures and mediation analysis
  • Time-varying confounding and G-methods


Learning Objectives:

By the end of the School, participants will be able:

  • To adopt a 'causal perspective' for the analysis of observational data, with the aid of directed acyclic graphs (DAGs);
  • To adopt a systematic approach to specifying, using, and interpreting DAGs for planning, conducting, and appraising observational research;
  • To recognise common, yet poorly recognised, pitfalls and challenges in modelling observational data;
  • To understand how various routine analytical approaches can introduce bias, leading to spurious research findings;
  • To appreciate the importance of data generation in the building and selection of appropriate statistical models;
  • To understand how alternative and emerging methods (such as mediation analysis, g-methods, and latent variable methods) can be used to conduct more robust analyses
  • To critically-appraise the modelling strategies of other researchers; and
  • To recognise the importance of, and begin the practice of, THINKING before DOING any statistical modelling of observational data!



"The best course I have been on. Excellent quality, passion and fascinating topic."

" The course is an absolute must for anyone who is serious about improving their own knowledge of data analysis."

"The course tutors and organisers have done a great job clearly explaining and contextualising these important and complex ideas in a challenging yet accessible way. This was a friendly, good value, well organised and fun course, and I would recommend it to others."

"The course was a paradigm shift for me in terms of thinking around causal inference and gave me the tools to think about some important pitfalls in analysis that I would otherwise have missed."

"Best course I've been on. Good balance of maths and practical stuff. Enthusiasm of tutors was key."



Mark S Gilthorpe, Professor of Statistical Epidemiology

Mark Gilthorpe is Professor of Statistical Epidemiology in the School of Medicine and the Leeds Institute for Data Analytics (LIDA), and a Fellow of the Alan Turing Institute for Data Science and Artificial Intelligence. Trained as a mathematical physicist, Mark's driving interest centres on improving our understanding of the observable world through modelling. Mark has since fashioned a programme of interdisciplinary research that spans the gap between theoretical and applied data analytics, focussing particularly on modelling complexity and highlighting and solving common analytical problems in observational research. Mark's research and teaching interests have converged around the insights and utility of causal inference methods, and how these might be integrated with machine learning and AI; he is a recognised expert in latent variable modelling and analysis of longitudinal data. Mark is interested in ‘algorithmic explainability’ and the development of ‘smart AI’, i.e. the use of causal inference methodology to understand the workings, operations and consequences of machine learning and artificial intelligence.

Peter WG Tennant, University Academic Fellow in Applied Health Data Analytics

Peter is a University Academic Fellow in Health Data Science at the Leeds Institute for Data Analytics and a Fellow of the Alan Turing Institute for Data Science and Artificial Intelligence. He has a PhD in Epidemiology from Newcastle University, where he worked for many years as an applied population health scientist. Since moving to Leeds in 2015, Peter's research has become increasingly focused on translating modern and emerging data scientific methods into applied research, particularly causal inference. He leads the Causal Inference interest group within the Alan Turing Institute and was invited onto the CRUK Epidemiology Expert Review Panel to provide knowledge in causal inference. He is also increasingly recognized as a skilled teacher, particularly in causal inference methods, and he is Deputy Programme Leader of the University of Leeds MSc in Health Data Analytics.



Please note there is a maximum of 30 places available; places are competitive, and are available on a first-come-first-served basis. Once the places have been filled we will close the application process.

Fees include tuition, refreshments and lunches throughout the 5-day school. Please note that travel, breakfast and accommodation are not included in this fee.

£395 (postgraduate student rate)
£695 (researchers, academic staff, and public and charitable sector employees)

If a cancellation is made within one month of the School taking place, we regret that we may not be able to refund your fee.

Accommodation on campus (including breakfast) has been reserved for the School from the Sunday evening (8th Sept) to Friday 13th Sept - we will provide a link to book this accommodation when you are notified of the success of your application (cost is £49 per night). Individuals are free to arrange alternative accommodation if preferred. Please do not book accommodation until you have received confirmation of your place on the course.


If you have any queries about this School or would like to be added to the waiting list, please get in touch with Kylie Norman.