Before moving to Leeds, I lived and studied in Lisbon, Portugal – yes, I have seen more grey skies this autumn than ever before. I completed a BSc in Computer Science after my initial attempt at Medical School. I mostly enjoyed learning about databases and exploring machine learning algorithms with Python, during which I realised how those tools could help solve complex problems. LIDA’s internship programme seemed like a unique opportunity to understand how my skills could be developed to help tackle real issues.
Induction Week / Arriving at LIDA
Our induction week started with a warm welcome from the previous cohort of interns, our future supervisors and the amazing LIDA staff, in both formal and informal settings. We had the opportunity of seeing what the last year’s projects were all about and that gave me a better sense of what to expect. I was pleased to see the topics were quite diverse: from improving healthcare support for an ageing population to how to design a better public transport infrastructure. All these projects aimed at having a ‘real world’ impact, which motivated me even more. We also had the opportunity to get to know a few of LIDA’s corporate partners and have some training sessions about essential data science skills and terminology.
As I got to know my fellow cohort of interns more and more, I was pleased to notice that we all had different experiences and academic backgrounds. This makes it a very enriching environment for cooperation between peers. From then on, we have had several outings and have regularly taken part in social activities, from Thursday Pub evenings, in which we gather around with PhD students and crawl Leeds pubs, to little social events and clubs that we organise. This is always a great way to let some steam off and share some of our greatest existential frustrations. It’s also a great way to see what PhD students are up to – a potential inspiration for one’s academic path.
It is estimated that over a million UK residents live in ‘food deserts’, which is a designation for geographical areas where people might experience barriers in accessing healthy and affordable food. I will be trying to assess the presence of a new form of ‘e-food’ desert – remote neighbourhoods where residents experience limited availability or affordability of online groceries home deliveries. I’m working alongside Dr. Andy Newing, a Leeds University retail geography expert and a known tea enthusiast. In our first meeting, I was introduced to these topics and we discussed the best approaches to employ in order to get a sense of how big this problem was. I was particularly interested in the fact that we would also try to simulate the presence of e-food deserts by tweaking delivery services availability and cost variables so we could identify neighbourhoods vulnerable to these changes.
At this point, I have started looking at the data that might help identify food deserts. Using web-scraping techniques we were able to gather information about where retailers are delivering and I am now in the phase of analysing it to produce meaningful information.
I’ve used Python to build a Spatial Interaction Model (SIM) to capture aspects of geographical accessibility to retail services. This is one of the pillars of our project, as it will be a way of assessing where certain groups of people are most likely to shop.
Later on, we will start building a composite indicator – a way of gathering individual indicators (like the distance to the nearest supermarket) and compiling them into a single index to measure multidimensional concepts. This could range, for example, from 1 to 10 and these scores would be assigned to each geographical area we are studying. I will be trying to create one of these to measure the prevalence of food deserts, where e-food deserts will act as one of the dimensions.
Working with extremely large data files has been challenging – and is something which is becoming more of an industry-wide issue because of the increasing availability of Big Data – but at the same time it’s been interesting to deal with and has brought to mind a lot of my previous study in computer memory concepts.
LIDA takes an active approach towards ensuring that interns have the necessary tools to tackle their projects. We had specific training by the Office for National Statistics on safe research, so we could have a better understanding of how to deal with and use controlled datasets. We also had comprehensive training on the essential software settings for data scientists (Anaconda, Python and Git). There was also the opportunity to learn Geographic Information Systems (GIS) and to run machine learning algorithms on Weka, which some interns will be taking forward in their projects. Training opportunities are available throughout the year and we are encouraged to attend them for our professional and personal development – this is really useful for anyone who doesn’t have wide experience in data science or computer programming.
As part of my training as a LIDA intern, I had the opportunity of attending a conference on Composite Indicators in Ispra (northern Italy), led by the Competence Centre on Composite Indicators and Scoreboards (COIN) of the European Commission’s Joint Research Centre (JRC). This was a unique learning and networking opportunity, as I got to meet people from around the world and understand the work they are conducting to tackle social issues such as climate change, gender inequality, sustainability and poverty.
My hotel was (unintentionally!) located overlooking Lake Maggiore, where in the morning you could witness the golden sun gently glowing on the snowy mountains – or sometimes grim skies that almost make you miss the British weather. There were many hiking trails by the lake and I really enjoyed walking the Passeggiatta Dell’Amore. On the first day, I met Grayson, one of the conference attendees, who (inexplicably) spoke English, Portuguese, Italian, Russian and French. We roamed around the small town of Ispra a couple of times, only to find ourselves in a German restaurant eating pretzels and something we suspected was not gnocchi (it turned out to be German gnocchi after all). I am a little disappointed to say I didn’t have any pizza, but I think all the pasta made up for it.
The first three days of the conference consisted of delving into the best practises and state-of-the-art methods used to ensure composite indicators were sensibly built. Amongst other topics, we discussed the different procedures social scientists use to address missing data and outliers; the best statistical techniques we could choose in order to weight the indicators; and innovative means of visually presenting the data. We also had the chance of learning how to use an Excel-based tool developed by the COIN team for constructing composite indicators. These are learnings I will take forward in my project.
The last two days of the course were about bringing together international organizations, academia and policy analysts to share their challenges and experiences in building multidimensional frameworks that help to shape policy and monitor progress. It was particularly interesting to have more insight on the gap between the works/products of science and their acceptance by society. One has to be really mindful about the trade-off between the advanced and complex techniques used (to build an index) and the ease of communicating them to an audience.
In summary, I’m very keen to continue my experience as a Data Scientist Intern at LIDA. There’s enough room to grow and improve my skills in a supportive academic environment where people are enthusiastic about sharing their knowledge. I am certain this will be a valuable stepping stone in my career.