AI as a Developing Data Scientist

Date: Wednesday 31 May 2023

By Jian Feng/Franks, Data Scientist, LIDA

Introduction

One of the great perks included in our training on the Data Scientist Development Programme (DSDP) is that we get a budget to attend national and international conferences. So I was delighted to attend The Alan Turing Institute’s AI UK conference in March.

It’s the first time the conference has come back in-person since the pandemic, with panel discussions including healthcare, biology, digital twins, workshops and networking.

Me and my colleague Alex. He is interested in medical topics

I was excited to have a chance to attend the conference in London with my fellow DSDP colleague Alex. It enhanced my experience to have another perspective, as Alex studied medical science and his project topic is related to crime, while my research topic was related to public transport. We discussed what we learned from different workshops and how we can apply it to our current projects. He also introduced me to students from the Centres for Doctoral Training.

AI – Pros vs. Cons

As I come from a social science background, I was particularly interested in the panel discussion about regulation on ChatGPT; how AI can contribute on financial services and support the government. The panel was held by Helen Margetts, Programme Director for Public Policy at The Alan Turing Institute who invited journalists from MIT Technology review, and Gary Marcus, who is a leading voice in artificial intelligence. Gary is a scientist, best-selling author and serial entrepreneur (Founder of Robust.AI and Geometric.AI, acquired by Uber).

During the panel discussion about how AI can support the government, the panel talked about the racism in AI systems recreated by people, which is already happening in our daily life and proved by researchers. I was shocked because I didn’t realise we can reproduce our concept of preference to candidates background through coding into the AI system. For example, many companies are using AI algorithms to select candidates for recruitment. However, research shows that if a recruiter is racist, the AI system will be unlikely to select candidates from a minority background.

This could worsen gender and race inequality and unemployment issues faced by disadvantaged groups. According to professors from University of Cambridge’s Centre for Gender Studies, these tools may actually reproduce cultural biases of the “ideal candidate,” which has historically been white or European males. The panel discussion also discussed the strategy of how to implement AI in policy decision making, as this is one of the key challenges they are looking to solve. I look forward to seeing how the Alan Turing Institute’s next steps develop.

Scholars from Turing AI policy department talked about their strateg

Scholars from the Alan Turing Institute AI policy department talked about their strategy

This was one of the most popular panel discussions of the conference. I'm glad that many people can’t wait to save our planet.

One of the most popular panel discussions. I'm glad that many people can’t wait to save our planet.

Another talk that caught my attention was about ChatGPT. It first came out in December 2022 and while most people were pleased about it, concern is definitely growing at the same time about the risks of AI. Dr. Gary Marcus talked about the potential risks and suggested everyone to slow down and think about the regulation before investing in this area. He stated that ChatGPT still makes mistakes at the moment, it can’t tell people what to do. This is because it generate answers based on information it received before, instead of creating new knowledge. So not all information is correct.

Besides emphasising the importance of teaching the public understand how ChatGPT works, he also gave an example about how young students can learn with it. For example, a teacher can ask ChatGPT to write something, then ask their students to criticize the structure of it and discuss how would they improve it. He also reminded us it’s only one of the applications of AI, there are many more and we can think about how to use it in different industries.

The opening talk of UK AI about ChatGPT.

The Data Scientist Development Programme

As I said, I was only able to attend this conference due to being on the DSDP. But there’s a lot more to the programme than this.

LIDA treat diversity as a key part of their values, and it’s true! Besides the amazing project which has a positive impact on the community, I feel supported in the office. Everyone has a diverse mindset and respects your multicultural background. They want you to be successful! I enjoyed talking to my supervisor and mentor not only about my current project, but also about British culture. I learnt about what people usually do during holidays, such as Easter and Christmas. I enjoyed the Christmas event in LIDA office, when I got a chance to see the hidden talent of my colleagues! For example, Dustin is a puzzle expert!

I organised a Christmas meal with some of my colleagues, I like the colourful hat!

I know my coding skill is not the best, therefore I am suitable for this program to develop my skills!

The program developed my coding skill a lot. More importantly, I learnt some great work habits, mainly time management and task prioritisation, from my supervisors Peter and Ed, as well as other colleagues. My colleagues come from different countries and degree levels. This means I am always learning from them.

I find this opportunity unique. If you’re keen to be a data scientist, or still unsure whether you want to be a data scientist or data engineer, it can be a good step build up your career.

I only got into data science in my final year of my undergraduate degree because I realised that data science is a good asset to help building up a better society. There are many training sessions and funding to support your expense for national or international conferences. You can also meet academics and colleagues from multiple faculties, they can be data scientists, data engineers, or researchers. They are friendly and talk to you about any questions, from research topics, coding, to your future career! I’m glad there’s always some event going on in the office, so I’m not just sat behind the screen the whole day.

My first project focused on public travel flow under the pandemic in the UK. I am glad that I can assess data based on the mobile APP users (with their consent). It took me some time to build up my problem-solving skill and work individually. I felt proud that I got used to using python and learn about QGIS, Tableau as part of my training session. I eventually produced the visualisation of choropleth map, which showed a clear changing trend of travel flow during the lockdown. I couldn’t finish all of the objective because of the learning process and timings. I mainly learned data organising and visualisation skills. I look forward to improving it during my second project and learn more about machine learning.

Data is unpredictable!

I want to share a lesson I’ve learned from the programme on enjoying the journey, adapting and embracing unpredictable changes.

I never know what result I could get after running the code. At the beginning of my project, I always struggled to get the code running and rush to get the result, so I can move on to the next task. When the code finally worked and I brought the result to the weekly meeting with my supervisor Peter, he said we can look into another direction based on the result we get. Maybe I am new to the data science world, I always thought the result is predictable and we can foresee the progress—which is mentioned in the project objective. Nevertheless, no one would know there may always be a new idea coming out during the project, or you may realise the outcome you predict could be totally wrong. It is fine!

This is the magic part of data science, and I built up my problem-solving skills. I started to change my mind: Instead of focusing on getting the ‘perfect’ result, solving issues and reflect from the result I got are the key part of developing my skills.

Working towards unpredictable is the best way to build up my skill.

Find out more about the Data Scientist Development Programme