Can machine learning accurately predict the likelihood that children require paediatric treatment in hospital?    

Project overview 

 Bradford is home to an innovative hospital-at-homeservice for children and young people, called the Ambulatory Care Experience (ACE). ACE offer an alternative to hospital admission for patients that require urgent care. They treat and monitor patients in their own homes in a virtual ward, under the care of a consultant paediatrician. Key in this approach to urgent care provision is determining which patients are of sufficiently low concern to be suitable for home treatment. This project aimed to use machine learning to model the probability that patients referred to the ACE service will require in-person hospital treatment, and to use insights from this modelling to improve ACE referral decisions.  

Data and methods  

The data consisted of the anonymised referral records of 502 patients treated by ACE, labeled with one of two treatment outcomes – successful treatment at home, or requiring subsequent in-person hospital treatment. The records detailed general information, basic clinical observations, and related metadata for each patient referred to the ACE service.  

Initially, popular machine learning techniques were leveraged to classify the patients according to the likelihood that they would be hospitalised. The structured data – numeric observations, or categorical features fitting into discrete classes – were used to train models ranging from simple logistic regression and k-nearest neighborsclassifiers, to random forestand gradient boosted decision treeensemble methods. Particular attention was paid to the scarcity of positive labels (patients requiring hospital treatment), so various techniques to balance the dataset were tested: label weighting, random undersampling and synthetic minority oversampling (SMOTE). 

The analysis then focused on the unstructured records – free-text notes detailing patient examinations and medical histories. Common text processing techniques were used to establish a reduced vocabulary of the most common words in each of the text features, and the individual notes into word counts were vectorised from this vocabulary. Term frequency / Inverse Document Frequency (TF-IDF) figures were calculated for each of the vectorised notes, and these figures were analysed using simple visualisation and lasso logistic regression to establish which words / concepts were most associated with hospitalisation or successful discharge. 

Finally, to expand the initial modelling, Bayesian analyses of the treatment outcomes were conducted. Bayesian approaches allowed quantification of the uncertainty or variance in outcomes that wasnt captured by the frequentist techniques used in the initial predictive modelling. Treatment outcomes were modelled using a Markov Chain Monte Carlo (MCMC) sampler, with a simple logistic regression likelihood, using diffuse priors for each of the coefficients. Input features for the logistic regression model were chosen iteratively, selecting for the highest expected log pointwise predictive density (ELPD) across the possible models.  

Key findings  

Results from the initial data analysis, frequentist modelling, and the Bayesian approach all strongly suggested that it is not possible to accurately predict treatment outcomes using the ACE referral data. Despite testing numerous data engineering and classification modelling approaches, the predictions and accuracy scores of the frequentist models were hardly better than a chance assignment of outcomes (accounting for the proportions of the different labels in the dataset). The Bayesian analyses do highlight a handful of referral criteria that are predictors of higher or lower risk of hospitalisation; the predictive distributions from the Bayesian models, on the other hand, are either very confident of successful treatment, or are highly diffuse. This highlights that the majority of patients exhibiting high-risk features are still successfully discharged without referral to hospital. 




From these findings, it is tempting to conclude that the collected data are not fit for purpose. It should be highlighted, however, that the dataset included only patients that were accepted for ACE treatment. Given that patients treated by ACE were already deemed to be of sufficiently low hospitalisation risk, based largely on the observations in the referral data, it is unsurprising that there were few strong predictors of hospitalisation within those same observations. Indeed, it is the relative success of ACE decision making that results in the lack of any clear division between the patients that later required hospital treatment and those that were discharged successfully. These findings serve to endorse the decisions that ACE are making based on the current referral observations – they arent missing any obvious indicators of hospitalisation risk. 

Analyses of the referral notes show interesting relationships between specific keywords and hospitalisation risk. Patients whose medical histories mention asthma, or whose examination notes mention the drug Salbutamol, were found to have a greater risk of requiring hospital treatment. The Bayesian analysis showed that these features had a much stronger relationship with hospitalisation than almost every other observation recorded in the referral data. As such, it is likely there are other clinical features that have a significant impact on hospitalisation risk but are not part of the current ACE referral criteria. Thus, the primary recommendation of this study is that the ACE records should be aggregated with NHS primary care data, so that these additional predictors can be studied further.  


Value of the research  

This study demonstrates, first and foremost, that the ACE referral process makes effective use of the observations currently collected during referral. The results and analysis indicate the features that are most predictive of an increased or decreased risk of hospitalisation once a patient has been deemed suitable for home treatment, and that can be used to further inform referral and treatment decisions. 

The results make a strong case that the addition of primary care data – GP and hospital records – would be extremely valuable within this study. The analyses of the referral notes demonstrated that features of patientsmedical history or examination that arent part of the ACE referral criteria are predictive of hospitalisation risk. Permission to use primary care records has since been granted and will form the basis of the second phase of this project. 


The NHS Long Term Plan(2020) and the RCPCHs Paediatrics 2040report (2021) highlight the importance of moving to a more streamlined NHS. Machine learning has the potential to help clinicians make clinical decisions from routinely collected data – providing the right care, at the right time, and in the right place.  

Dr Mathew Mathai – Consultant Paediatrician, Bradford Teaching Hospitals NHS Foundation Trust 



  • ACE clinicians make effective use of the basic referral data they collect when deciding which patients are suitable for home treatment – machine learning approaches are unable to use these data to accurately predict hospitalisation outcomes 
  • Analyses of the ACE referral notes strongly suggest that there are other clinical features that arent part of the ACE referral criteria but which are predictive of hospitalisation risk – these should be studied further 


Research theme 

  • Health informatics 



Sam Relins – Data Scientist Intern, Leeds Institute for Data Analytics 

Dr Mathew Mathai – Consultant Paediatrician, Bradford Teaching Hospitals NHS Foundation Trust 

Professor Mark Mon-Williams Professor of Psychology, University of Leeds 

Ruaridh Mon-Williams Informatics Scientist, University of Edinburgh 



Bradford Institute for Health Research (BIHR)