When I applied to the Leeds Institute for Data Analytics (LIDA), I had recently graduated from my MSc in Bioinformatics and Systems Biology. During my MSc I was introduced to a range of techniques in computational biology and data science, and really dove head first into the computational side of things. I left with a real passion and enthusiasm for the work. When I saw the LIDA internship advertised with the focus on training and professional development, allowing interns to take charge of two real-world projects, I immediately applied and have never looked back!
My Internship Project
The LIDA internship is meant to be a challenge and to take you out of our comfort zone. As such, the projects are not necessarily allocated based on matching experience, but on how much you could potentially learn and improve. My project is in the field of Human Geography and Public Health with the goal of ascertaining how we can better support ageing populations. It is the result of a collaboration between Dr. Nik Lomax and Professor Stephane Hess at the University of Leeds, and Bryan Tysinger at the University of Southern California (USC).
Population ageing is now seen across the world in virtually all developed and developing nations, so the applications of this work are potentially widespread and critical for how we support the healthcare needs of ageing populations. Our project uses a model originally developed by a team at the Schaeffer Center for Health Policy & Economics, of which Bryan is a member. The model, named the Future Elderly Model (FEM), is a data-driven dynamic microsimulation model (and a bit of a tongue-twister!). It was originally developed to assess the health and healthcare costs among the elderly Medicare population (aged 65+), but was later extended to all Americans aged 51 and older.
Each individual unit (i.e. person) is treated as an autonomous entity in the microsimulation, and transition probabilities can be estimated for the many variables associated with that person. Unlike in some other fields, microsimulation in health sciences does not necessarily involve interaction between individuals, as is the case here.
The US-FEM uses data from longitudinal surveys, specifically the Health and Retirement Survey (HRS) in the US model. In longitudinal surveys, the same questions are asked to the same group of people at regular intervals (called waves), allowing us to investigate the changeover time of a variable of interest. For example, we could have a variable that relates to a disease state, such as Cancer. Our model would allow us to estimate the probability that a given person would transition from not having Cancer to having Cancer between waves. The HRS collects a wealth of data on health outcomes in the elderly population, as well as economic variables that could impact or depend on these health outcomes.
During my internship I adapted the original US-FEM to an English context, by replacing the data sources with the English equivalent, called the English Longitudinal Study of Ageing, and attempting to resolve the inconsistencies between data from the two nations.
Time at USC
As part of the project, I was lucky enough to go on a secondment to the USC, to work directly with Bryan Tysinger and his team.
Two days before I was due to fly to LA, California was hit by its largest earthquake for almost 5 years. After spending the next few days packing and calming my mum down, I was boarding a plane in Manchester and on my way to LA. While flying over Nevada, California was hit by another earthquake – this time the largest in almost 20 years! After I arrived I got a few lectures on earthquake safety, but luckily that was the last I heard about them.
I arrived at my hotel late on a Friday evening, and so had all weekend to acclimatise and see some of the many places recommended to me before starting work. I took the obligatory selfie in front of the Beverley Hills sign, cycled underneath Santa Monica pier down to Venice beach, and ate masses of Mexican food at almost every opportunity.
Working with Bryan and other members of the team at USC was certainly the highlight of the trip. The campus at USC is lush and green, filled with grandiose buildings and water fountains around almost every corner. Bryan was very generous with his time, dropping the majority of his own responsibilities to focus on our collaboration for the week. This very quickly translated into progress, and within the first two days we had produced a working version of the model (albeit with some issues). The remaining time was spent identifying these issues and discussing potential solutions, as well as learning as much as possible about the inner workings of the model and how to run new types of experiments.
Coming back to LIDA
There were many benefits to my time at USC (aside from the Mexican food). I gained a huge amount of knowledge on how the model itself works – identifying and debugging errors, planning interventions, and accessing the right parts of the complicated output it produces. I also greatly improved my skills in Stata (statistical software), and became comfortable debugging C++. The secondment has given me the confidence to be far more independent in my work, only relying on Nik or Bryan for guidance and debugging the occasional cryptic error.
Update on the Project
The work has moved on in leaps and bounds since the secondment. We have identified the major steps to be able to produce publishable output, and planned our follow-on work with a goal of publishing a paper on the wider impact of preventing chronic disease in the elderly. The benefits to the project on the whole are massive; I do not think we would have made the progress we have without the secondment.
Would you recommend the LIDA internship?
The LIDA internship has provided me with opportunities that simply would not have been afforded anywhere else. As well as the secondment, I have attended many training courses (internal and external) on various topics in data science, project management, presentation skills, and many more. These courses are taught by friendly and knowledgeable people, more often than not experts in their field. It also helps that the LIDA offices are large open plan spaces, which encourages discussion and collaboration. We have often found that the best resource to help us in our work is sat only a couple of rows away! In a similar vein, our projects have not usually focussed on delivering some tangible tool or product, but on investigating a complex problem and finding a solution, learning as much as possible on the way. If you have a passion for data science and solving complex problems, I would highly recommend the LIDA internship.
Having completed his Data Scientist Internship in September 2019, Luke is continuing his work on this project as a Research Data Scientist.