The Section of Epidemiology and Biostatistics, LICAP, utilizes “big data” to understand susceptibility to, and survival from the form of skin cancer called melanoma.

Melanoma is largely a disease of pale skinned peoples and understanding susceptibility within those peoples is important. It is important in order to direct melanoma prevention advice to those at risk (essentially avoidance of intense vacational sun exposure) without prejudicing vitamin D levels in those at significantly smaller risk.  It is also important in terms of understanding the biology of melanoma development, in the hope of identifying treatments for melanoma in the future. Work carried out has been epidemiological, resulting in the confirmation of sunbathing on sunny holidays as a key risk behavior and regular lower intensity weekend sun exposure as protective, funded by Cancer Research UK and the EU (Framework 6). The research group is a genetic epidemiology research group so the second aim has been to identify susceptibility genes and how those genes interact with sun exposure.  There are two main approaches to identifying genes which influence risk of melanoma; (i) family studies, that is the study of families in which multiple people have a diagnosis of melanoma to identify rare genetic variation associated with a high risk of melanoma (“high penetrance”) and (ii) studies of unrelated persons, those with a diagnosis of melanoma (“cases”) and persons with the same ethnic background and of a similar age and gender distribution without a diagnosis of melanoma (“controls”) to identify genetic variation with more modest differences in risk.

The group worked to identify rare high penetrance susceptibility genes: important work again in order to understand the biology of melanoma formation, but also to inform genetic counselling in families at very high risk of melanoma. Typically, in the UK, this means identifying families with three or more cases of melanoma among close relatives; given the frequency of melanoma, this is a priori unlikely to happen by chance.  In order to facilitate international collaboration to promote this activity, the group has instigated the consortium known as GenoMEL ( and has led and funded this since 1997. Funding came from the USA NIH and from the EU under Framework 6. In more recent years, the Leeds group has worked with Dr. David Adams at the Sanger Institute in Cambridge, to carry out whole exome sequencing of germline DNA to identify familial genes 1-3. This technique allows much more detailed examination of a person’s DNA, to find rare changes which lead to multiple people in a family getting that cancer. This approach generates huge quantities of data and bioinformatics examination is crucial to identify causal genes which feels sometimes like looking for a needle in a haystack.

The research group has a long history of managing large data sets in the form of genome wide association studies (GWAS) 4-9 designed to identify medium to low penetrance melanoma susceptibility genes, but over time, through collaborations the study size has increased further. In GWAS studies then the genetic variation among cases is compared to the genetic variation among controls.  Current studies are based on Single Nucleotide Polymorphisms (SNPs) which focus on a single DNA base which varies between individuals The differences between the two identify genes which increase risk.   In our most recent meta-analysis of 15000 cases and 25000 controls, twenty SNPs had a p-value of less than 5 x 10-8, a significance level considered proof that the SNP differs between cases and controls taking into account the large number of SNPs examined in the experiment.  We have shown that these SNPs relate to patient characteristics which are known to be themselves associated with melanoma risk, such as particularly fair, sun-sensitive skin such as among red-haired persons and large numbers of benign melanocytic nevi. 10  Figure 1 shows the association between genetic variant and risk.

Melanoma incidence is continuing to increase in most western countries, and the fastest recent increase has been amongst older people (>60 years) and especially in men. Age increases the risk of death from melanoma. The reason for this is not known but may relate to reduced immunity. Death is also more likely in men than women: again the reason for this is unknown but we and others are trying to understand this, because the mechanism may tell us something key around cancer formation. As the incidence is rising in older men, then it seems likely that death rates will actually increase for melanoma rather than decrease, in the next period. Much of the work done by the Leeds group is therefore designed to understand why the majority of melanoma patients survive and some do not. Most of the work of this type involves the management of large data sets.

Much as for susceptibility research, the group is carrying out work designed to establish environmental factors which predict survival. In 2009 we reported that higher vitamin D levels in the blood at diagnosis is associated with better survival 11. Since then we have used genetic 12, epidemiological 13 and transcriptomic 14 approaches to determining if the association with vitamin D and melanoma is causal. Transcriptomic analysis is another genetic approach, but in this case, we look at the DNA in the tumours themselves and look at which genes are being expressed. Examination of a 703 primary melanoma gene expression data set is currently taking place to investigate which biological pathways are perturbed in primary tumours in relation to serum vitamin D levels, in order to explore whether the relationship between vitamin D levels and melanoma is real ie cause and effect, and to understand the effects of vitamin D on melanomas. If we report strong evidence of vitamin D as a modifier of survival, and can understand the biological changes underlying this then we have hopes of the identification of an effective adjuvant therapy. This work is part funded by the MRC and by CR UK.


Figure 1. In this “Manhattan Plot”, chromosomes are shown across the horizontal axis. Each dot represents the result for one single SNP (genetic variant) plotted for its known location on each chromosome (actually its base-pair number from the end of the chromosome). The y-axis shows –log(p) where p is the p-value testing the association of that SNP with case/control status. We use –log(p) so that the higher the tower the more significant the result. Towers are colour-coded as the SNPs in blue are also associated with the number of moles a person has (a known risk factor), violet towers denote SNPs associated with a person’s skin pigmentation and how their skin reacts to sunlight and green denote SNPs associated also with a final risk factor (the length of the telomeres at the end of each chromosome).

Genomic Alterations in Melanoma

All cancers are different. Some, although still deadly, arise from relatively well understood changes (or mutations) in a patient’s genetic code. Melanoma, on the other hand, appears very differently in different people, making the task of understanding “how the cancer works” very much harder.

The Section of Epidemiology and Biostatistics is using big data to address this problem. We have collected very large numbers of tissue samples from patients along with detailed information on their disease and lifestyle. We aim to associate the genetic alterations we see in their specific cancers to their lifestyle (for example smoking, or sunny holidays), family history and clinical data (such as vitamin D levels).

For one of these studies, we have collected 356 DNA samples from patients and performed Next-Generation Sequencing (NGS) on each one.  This technique allows us to “read” the genomic sequence of their cancer samples. We have then looked for any regions of their genome that are lost or appear multiple times. These copy number variations have been linked to many cancer types including melanoma, and we hope to use our data to help work out the role they play in cancer formation and spread.  Figure 2 shows the results for a region of chromosome 9 known to be damaged in many melanomas.

The generation of these data sets is a very computationally-intensive problem. We have generated vast amounts of data for this project (over 16 Terabytes and counting). This has taken over 45 CPU weeks of computation time. This has been made possible by the high-performance computing facilities provided by the MRC Center in LIDA.



The 703 primary tumour transcriptomic data set is currently being explored in order to identify evidence of immune responses to melanoma and what genetic variation in the tumours drives that immunity (or suppresses it). This is a large data set and the work requires prudent use of evolving bioinformatics to understand the biology.  The data will be strengthened by additional copy number data generated from approximately half of the samples, funded by AICR. Copy number data is yet another genetic approach to research on cancer: here we look at where the tumour has lost little stretches of the DNA or gained stretches. The copy number data were generated using formalin fixed using next generation sequencing  and the bioinformatic analysis of these data has requiring considerable input from Dr Alastair Droop in LIDA.

Cancer patients are prescribed drugs for other medical conditions eg diabetes or arthritis. These drugs benefit them in terms of controlling these other medical conditions but there is a theoretical effect on the cancer as well. It could be that the drugs might help to terat the cancer or actually make it worse for the patient. It would be very difficult to prove one way or other without large scale research. The University of Leeds is therefore trying to use national NHS data to investigate this. In this work, then, we have to use information stored in NHS records which have been anonymized, to ask if cancer survival is greater, or less in people given a given drug for example for diabetes, compared with patients who have had a similar cancer and are a similar age and sex.  The theoretical effect of exposure to these drugs is likely to be relatively small overall (although of course very important for individual patients), so studies need to be very large. In order to preserve confidentiality of patients, elaborate safety mechanisms are put in place to ensure that the University cannot know whose information they are looking at. This research is funded by Melanoma Focus and CR UK. If the research is successful and we show that a given drug is safer than another for cancer patients this will be very important for cancer patients internationally.



Figure 2. This plot shows chromosome 9 from one end of the chromosome to the other studied for extent of damage in terms of the integrity of the DNA content of the chromosome. In the top figure, comparing the DNA content of each region with that of normal cells, shows that the DNA in this melanoma has minimal change whereas in the lower image, there are many regions in which there is less DNA (ie DNA in that region has been lost) and other regions where DNA has been gained. This lower figure is typical of an advanced melanoma whose DNA is notably damaged compared to the normal cell structure.



1.Robles-Espinoza CD, Harland M, Ramsay AJ, et al. POT1 loss-of-function variants predispose to familial melanoma. Nature genetics 2014;46:478-81.
2.Aoude LG, Pritchard AL, Robles-Espinoza CD, et al. Nonsense mutations in the shelterin complex genes ACD and TERF2IP in familial melanoma. Journal of the National Cancer Institute 2015;107.
3.Harland M, Petljak M, Robles-Espinoza CD, et al. Germline TERT promoter mutations are rare in familial melanoma. Fam Cancer 2016;15:139-44.
4.Bishop DT, Demenais F, Iles MM, et al. Genome-wide association study identifies three loci associated with melanoma risk. Nature genetics 2009;41:920-5.
5.Barrett JH, Iles MM, Harland M, et al. Genome-wide association study identifies three new melanoma susceptibility loci. Nature genetics 2011;43:1108-13.
6.Iles MM, Law MH, Stacey SN, et al. A variant in FTO shows association with melanoma risk not due to BMI. Nature genetics 2013;45:428-32, 32e1.
7.Iles MM, Bishop DT, Taylor JC, et al. The effect on melanoma risk of genes previously associated with telomere length. Journal of the National Cancer Institute 2014;106.
8.Amos CI, Wang LE, Lee JE, et al. Genome-wide association study identifies novel loci predisposing to cutaneous melanoma. Human molecular genetics 2011;20:5012-23.
9.Macgregor S, Montgomery GW, Liu JZ, et al. Genome-wide association study identifies a new melanoma susceptibility locus at 1q21.3. Nature genetics 2011.
10.Law MH, Bishop DT, Lee JE, et al. Genome-wide meta-analysis identifies five new susceptibility loci for cutaneous malignant melanoma. Nature genetics 2015;47:987-95.
11.Newton-Bishop JA, Beswick S, Randerson-Moor J, et al. Serum 25-hydroxyvitamin D3 levels are associated with breslow thickness at presentation and survival from melanoma. J Clin Oncol 2009;27:5439-44.
12.Davies JR, Field S, Randerson-Moor J, et al. An inherited variant in the gene coding for vitamin D-binding protein and survival from cutaneous melanoma: a BioGenoMEL study. Pigment cell & melanoma research 2013.
13.Newton-Bishop JA, Davies JR, Latheef F, et al. 25-Hydroxyvitamin D2 /D3 levels and factors associated with systemic inflammation and melanoma survival in the Leeds Melanoma Cohort. International journal of cancer Journal international du cancer 2015;136:2890-9.
14.Jewell R, Elliott F, Laye J, et al. The clinicopathological and gene expression patterns associated with ulceration of primary melanoma. Pigment cell & melanoma research 2015;28:94-104.