ϟ

Stacey Gabriel

Here are all the papers by Stacey Gabriel that you can download and read on OA.mg.
Stacey Gabriel’s last known institution is . Download Stacey Gabriel PDFs here.

Claim this Profile →
DOI: 10.1101/gr.107524.110
2010
Cited 21,321 times
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
DOI: 10.1038/nature15393
2015
Cited 13,993 times
A global reference for human genetic variation
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
DOI: 10.1038/ng.806
2011
Cited 9,847 times
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
DOI: 10.1126/science.1099314
2004
Cited 8,942 times
<i>EGFR</i>Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy
Receptor tyrosine kinase genes were sequenced in non-small cell lung cancer (NSCLC) and matched normal tissue. Somatic mutations of the epidermal growth factor receptor gene EGFR were found in 15of 58 unselected tumors from Japan and 1 of 61 from the United States. Treatment with the EGFR kinase inhibitor gefitinib (Iressa) causes tumor regression in some patients with NSCLC, more frequently in Japan. EGFR mutations were found in additional lung cancer samples from U.S. patients who responded to gefitinib therapy and in a lung adenocarcinoma cell line that was hypersensitive to growth inhibition by gefitinib, but not in gefitinib-insensitive tumors or cell lines. These results suggest that EGFR mutations may predict sensitivity to gefitinib.
DOI: 10.1038/nature19057
2016
Cited 8,860 times
Analysis of protein-coding genetic variation in 60,706 humans
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human ‘knockout’ variants in protein-coding genes. Exome sequencing data from 60,706 people of diverse geographic ancestry is presented, providing insight into genetic variation across populations, and illuminating the relationship between DNA variants and human disease. As part of the Exome Aggregation Consortium (ExAC) project, Daniel MacArthur and colleagues report on the generation and analysis of high-quality exome sequencing data from 60,706 individuals of diverse ancestry. This provides the most comprehensive catalogue of human protein-coding genetic variation to date, yielding unprecedented resolution for the analysis of very rare variants across multiple human populations. The catalogue is freely accessible and provides a critical reference panel for the clinical interpretation of genetic variants and the discovery of disease-related genes.
DOI: 10.1038/nature11003
2012
Cited 6,591 times
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
The Cancer Cell Line Encyclopedia presents the first results from a large-scale screen of some 947 cancer cell lines with 24 anticancer drugs, with the aim of identifying specific genomic alterations and gene expression profiles associated with selective sensitivity or resistance to potential therapeutic agents. Cancer cell lines are widely used as preclinical models to gain mechanistic and therapeutic insight. Two manuscripts in this issue describe the large-scale genetic and pharmacological characterization of human cancer cell lines. Each group characterized collections of several-hundred cell lines using different platforms and analytical methods. Their results are complementary, and confirm that many human cell lines capture the genomic diversity of their respective cancers. Initial findings include the identification of a number of potential markers of drug sensitivity and resistance. For example, Garnett et al. report an association between EWS-FLI1 gene translocations, frequently found in Ewing's sarcoma, and sensitivity to PARP inhibitors, a class of drug currently in clinical trials for other cancer types. Barretina et al. report a possible association between SLFN11 expression and sensitivity to topoisomerase inhibitors. The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of ‘personalized’ therapeutic regimens2.
DOI: 10.1038/s41586-020-2308-7
2020
Cited 6,339 times
The mutational constraint spectrum quantified from variation in 141,456 humans
Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
DOI: 10.1016/j.ccr.2009.12.020
2010
Cited 6,223 times
Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1
The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefit in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.
DOI: 10.1126/science.1069424
2002
Cited 5,406 times
The Structure of Haplotype Blocks in the Human Genome
Haplotype-based methods offer a powerful approach to disease gene mapping, based on the association between causal mutations and the ancestral haplotypes on which they arose. As part of The SNP Consortium Allele Frequency Projects, we characterized haplotype patterns across 51 autosomal regions (spanning 13 megabases of the human genome) in samples from Africa, Europe, and Asia. We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. The boundaries of blocks and specific haplotypes they contain are highly correlated across populations. We demonstrate that such haplotype frameworks provide substantial statistical power in association studies of common genetic variation across each region. Our results provide a foundation for the construction of a haplotype map of the human genome, facilitating comprehensive genetic association studies of human disease.
DOI: 10.1002/0471250953.bi1110s43
2013
Cited 4,760 times
From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline
This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.
DOI: 10.1038/nature12213
2013
Cited 4,749 times
Mutational heterogeneity in cancer and the search for new cancer-associated genes
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
DOI: 10.1038/nature06258
2007
Cited 4,203 times
A second generation human haplotype map of over 3.1 million SNPs
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
DOI: 10.1016/j.cell.2013.09.034
2013
Cited 4,010 times
The Somatic Genomic Landscape of Glioblastoma
We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.
DOI: 10.1038/nbt.2514
2013
Cited 3,958 times
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples
Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
DOI: 10.1056/nejmoa1408617
2014
Cited 3,517 times
Age-Related Clonal Hematopoiesis Associated with Adverse Outcomes
The incidence of hematologic cancers increases with age. These cancers are associated with recurrent somatic mutations in specific genes. We hypothesized that such mutations would be detectable in the blood of some persons who are not known to have hematologic disorders.
DOI: 10.1038/nature08822
2010
Cited 3,344 times
The landscape of somatic copy-number alteration across human cancers
A powerful way to discover key genes with causal roles in oncogenesis is to identify genomic regions that undergo frequent alteration in human cancers. Here we present high-resolution analyses of somatic copy-number alterations (SCNAs) from 3,131 cancer specimens, belonging largely to 26 histological types. We identify 158 regions of focal SCNA that are altered at significant frequency across several cancer types, of which 122 cannot be explained by the presence of a known cancer target gene located within these regions. Several gene families are enriched among these regions of focal SCNA, including the BCL2 family of apoptosis regulators and the NF-κΒ pathway. We show that cancer cells containing amplifications surrounding the MCL1 and BCL2L1 anti-apoptotic genes depend on the expression of these genes for survival. Finally, we demonstrate that a large majority of SCNAs identified in individual cancer types are present in several cancer types. Two Articles in this issue add major data sets to the growing picture of the cancer genome. Bignell et al. analysed a large number of homozygous gene deletions in a collection of 746 publicly available cancer cell lines. Combined with information about hemizygous deletions of the same genes, the data suggest that many deletions found in cancer reflect the position of a gene at a fragile site in the genome, rather than as a recessive cancer gene whose loss confers a selective growth advantage. Beroukhim et al. present the largest data set to date on somatic copy-number variations across more than 3,000 specimens of human primary cancers. Many alterations are shared between multiple tumour types. Functional experiments demonstrate an oncogenic role for the apoptosis genes MCL1 and BCL2L1 that are associated with amplifications found in many cancers. One way of discovering genes with key roles in cancer development is to identify genomic regions that are frequently altered in human cancers. Here, high-resolution analyses of somatic copy-number alterations (SCNAs) in numerous cancer specimens provide an overview of regions of focal SCNA that are altered at significant frequency across several cancer types. An oncogenic function is also found for the anti-apoptosis genes MCL1 and BCL2L1, which reside in amplified genome regions in many cancers.
DOI: 10.1016/j.ccr.2005.03.023
2005
Cited 2,806 times
Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis
Polycythemia vera (PV), essential thrombocythemia (ET), and myeloid metaplasia with myelofibrosis (MMM) are clonal disorders arising from hematopoietic progenitors. An internet-based protocol was used to collect clinical information and biological specimens from patients with these diseases. High-throughput DNA resequencing identified a recurrent somatic missense mutation JAK2V617F in granulocyte DNA samples of 121 of 164 PV patients, of which 41 had homozygous and 80 had heterozygous mutations. Molecular and cytogenetic analyses demonstrated that homozygous mutations were due to duplication of the mutant allele. JAK2V617F was also identified in granulocyte DNA samples from 37 of 115 ET and 16 of 46 MMM patients, but was not observed in 269 normal individuals. In vitro analysis demonstrated that JAK2V617F is a constitutively active tyrosine kinase.
DOI: 10.1038/nature09298
2010
Cited 2,707 times
Integrating common and rare genetic variation in diverse human populations
Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation. The International HapMap Consortium, established to develop a haplotype map of the human genome describing the common patterns of DNA sequence variation, has now reached its third incarnation. HapMap1, published in 2005 (go.nature.com/gJisDm), contained more than a million SNP (single nucleotide polymorphism) genotypes generated in 269 individuals from four geographically diverse populations. Two years later, HapMap2 (go.nature.com/WttNWX) added more than 2.1 million SNPs to the original map in the same 269 individuals. With the aim of providing a resource for the latest wave of genome-wide studies focused on disease linkages, HapMap3 casts the net wider. About 1.6 million common SNPs were genotyped in 1,184 individuals from 11 global populations, and ten 100-kilobase regions were sequenced in 692 of these individuals. Here, the analysis of 'HapMap 3' is reported — a public data set of genomic variants in human populations. The resource integrates common and rare single nucleotide polymorphisms (SNPs) and copy number polymorphisms (CNPs) from 11 global populations, providing insights into population-specific differences among variants. It also demonstrates the feasibility of imputing newly discovered rare SNPs and CNPs.
DOI: 10.1056/nejmoa1409405
2014
Cited 2,693 times
Clonal Hematopoiesis and Blood-Cancer Risk Inferred from Blood DNA Sequence
Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent.We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling.Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow-biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones.Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.).
DOI: 10.1126/science.1142358
2007
Cited 2,660 times
Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels
New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B , in an intron of IGF2BP2 , and an intron of CDKAL1 —and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.
DOI: 10.1038/nature12912
2014
Cited 2,612 times
Discovery and saturation analysis of cancer genes across 21 tumour types
Although a few cancer genes are mutated in a high proportion of tumours of a given type (>20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600–5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics. Large-scale genomic analysis of somatic point mutations in exomes from tumour–normal pairs across 21 cancer types identifies most known cancer genes in these tumour types as well as 33 genes not known to be significantly mutated, and down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. Most cancer genes are mutated at intermediate frequencies, appearing in less than one in five samples of a particular tumour type, so the accurate identification of cancer genes needs to be based on large-scale sampling in order to take account of this mutation-rate heterogeneity. This study presents a statistical analysis of 21 tumour types from more than 4,700 tumour–normal pairs. The authors identify 33 previously unknown genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Further analyses suggest that near-saturation may be achieved with between 600 and 5,000 samples for a given tumour type, depending on background mutation rate.
DOI: 10.1126/science.aaz1776
2020
Cited 2,566 times
The GTEx Consortium atlas of genetic regulatory effects across human tissues
The Genotype-Tissue Expression (GTEx) project dissects how genetic variation affects gene expression and splicing.
DOI: 10.1016/j.cell.2015.05.044
2015
Cited 2,545 times
Genomic Classification of Cutaneous Melanoma
We describe the landscape of genomic alterations in cutaneous melanomas through DNA, RNA, and protein-based analysis of 333 primary and/or metastatic melanomas from 331 patients. We establish a framework for genomic classification into one of four subtypes based on the pattern of the most prevalent significantly mutated genes: mutant BRAF, mutant RAS, mutant NF1, and Triple-WT (wild-type). Integrative analysis reveals enrichment of KIT mutations and focal amplifications and complex structural rearrangements as a feature of the Triple-WT subtype. We found no significant outcome correlation with genomic classification, but samples assigned a transcriptomic subclass enriched for immune gene expression associated with lymphocyte infiltrate on pathology review and high LCK protein expression, a T cell marker, were associated with improved patient survival. This clinicopathological and multi-dimensional analysis suggests that the prognosis of melanoma patients with regional metastases is influenced by tumor stroma immunobiology, offering insights to further personalize therapeutic decision-making.
DOI: 10.1038/nature07423
2008
Cited 2,479 times
Somatic mutations affect key pathways in lung adenocarcinoma
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.
DOI: 10.1038/ng.3643
2016
Cited 2,459 times
A reference panel of 64,976 haplotypes for genotype imputation
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
DOI: 10.1016/j.cell.2015.10.025
2015
Cited 2,434 times
The Molecular Taxonomy of Primary Prostate Cancer
There is substantial heterogeneity among primary prostate cancers, evident in the spectrum of molecular abnormalities and its variable clinical course. As part of The Cancer Genome Atlas (TCGA), we present a comprehensive molecular analysis of 333 primary prostate carcinomas. Our results revealed a molecular taxonomy in which 74% of these tumors fell into one of seven subtypes defined by specific gene fusions (ERG, ETV1/4, and FLI1) or mutations (SPOP, FOXA1, and IDH1). Epigenetic profiles showed substantial heterogeneity, including an IDH1 mutant subset with a methylator phenotype. Androgen receptor (AR) activity varied widely and in a subtype-specific manner, with SPOP and FOXA1 mutant tumors having the highest levels of AR-induced transcripts. 25% of the prostate cancers had a presumed actionable lesion in the PI3K or MAPK signaling pathways, and DNA repair genes were inactivated in 19%. Our analysis reveals molecular heterogeneity among primary prostate cancers, as well as potentially actionable molecular defects.
DOI: 10.1016/j.cell.2012.06.024
2012
Cited 2,282 times
A Landscape of Driver Mutations in Melanoma
Despite recent insights into melanoma genetics, systematic surveys for driver mutations are challenged by an abundance of passenger mutations caused by carcinogenic UV light exposure. We developed a permutation-based framework to address this challenge, employing mutation data from intronic sequences to control for passenger mutational load on a per gene basis. Analysis of large-scale melanoma exome data by this approach discovered six novel melanoma genes (PPP6C, RAC1, SNX31, TACC1, STK19, and ARID2), three of which-RAC1, PPP6C, and STK19-harbored recurrent and potentially targetable mutations. Integration with chromosomal copy number data contextualized the landscape of driver mutations, providing oncogenic insights in BRAF- and NRAS-driven melanoma as well as those without known NRAS/BRAF mutations. The landscape also clarified a mutational basis for RB and p53 pathway deregulation in this malignancy. Finally, the spectrum of driver mutations provided unequivocal genomic evidence for a direct mutagenic role of UV light in melanoma pathogenesis.
DOI: 10.1126/science.aad0095
2015
Cited 2,247 times
Genomic correlates of response to CTLA-4 blockade in metastatic melanoma
Monoclonal antibodies directed against cytotoxic T lymphocyte-associated antigen-4 (CTLA-4), such as ipilimumab, yield considerable clinical benefit for patients with metastatic melanoma by inhibiting immune checkpoint activity, but clinical predictors of response to these therapies remain incompletely characterized. To investigate the roles of tumor-specific neoantigens and alterations in the tumor microenvironment in the response to ipilimumab, we analyzed whole exomes from pretreatment melanoma tumor biopsies and matching germline tissue samples from 110 patients. For 40 of these patients, we also obtained and analyzed transcriptome data from the pretreatment tumor samples. Overall mutational load, neoantigen load, and expression of cytolytic markers in the immune microenvironment were significantly associated with clinical benefit. However, no recurrent neoantigen peptide sequences predicted responder patient populations. Thus, detailed integrated molecular characterization of large patient cohorts may be needed to identify robust determinants of response and resistance to immune checkpoint inhibitors.
DOI: 10.1126/science.1208130
2011
Cited 2,211 times
The Mutational Landscape of Head and Neck Squamous Cell Carcinoma
The mutational profile of head and neck cancer is complex and may pose challenges to the development of targeted therapies.
DOI: 10.1038/nature22991
2017
Cited 2,094 times
An immunogenic personal neoantigen vaccine for patients with melanoma
The results of a phase I trial assessing a personal neoantigen multi-peptide vaccine in patients with melanoma, showing feasibility, safety, and immunogenicity. Neoantigens have long been considered optimal targets for anti-tumour vaccines, and recent mutation coding and prediction techniques have aimed to streamline their identification and selection. Two papers in this issue report results from personalized neoantigen vaccine trials in patients with cancer. Catherine Wu and colleagues report the results of a phase I trial of a personalized cancer vaccine that targets up to 20 patient neoantigens. The vaccine was safe and induced tumour-antigen-specific immune responses. Four out of six patients treated showed no recurrence at 25 months, and progressing patients responded to further therapy with checkpoint inhibitor. Ugur Sahin and colleagues report the first-in-human application of a personalized neoantigen vaccine in patients with melanoma. Their vaccination strategy includes sequencing and computational identification of neoantigens from patients, and design and manufacture of a poly-antigen RNA vaccine for treatment. In 13 patients, the vaccine boosted immunity against some of the selected tumour antigens from the individual patients, and two patients showed infiltration of tumour-reactive T cells. These results suggest that personalized vaccines could be refined and tailored to provide clinical benefit as cancer immunotherapies. Effective anti-tumour immunity in humans has been associated with the presence of T cells directed at cancer neoantigens1, a class of HLA-bound peptides that arise from tumour-specific mutations. They are highly immunogenic because they are not present in normal tissues and hence bypass central thymic tolerance. Although neoantigens were long-envisioned as optimal targets for an anti-tumour immune response2, their systematic discovery and evaluation only became feasible with the recent availability of massively parallel sequencing for detection of all coding mutations within tumours, and of machine learning approaches to reliably predict those mutated peptides with high-affinity binding of autologous human leukocyte antigen (HLA) molecules. We hypothesized that vaccination with neoantigens can both expand pre-existing neoantigen-specific T-cell populations and induce a broader repertoire of new T-cell specificities in cancer patients, tipping the intra-tumoural balance in favour of enhanced tumour control. Here we demonstrate the feasibility, safety, and immunogenicity of a vaccine that targets up to 20 predicted personal tumour neoantigens. Vaccine-induced polyfunctional CD4+ and CD8+ T cells targeted 58 (60%) and 15 (16%) of the 97 unique neoantigens used across patients, respectively. These T cells discriminated mutated from wild-type antigens, and in some cases directly recognized autologous tumour. Of six vaccinated patients, four had no recurrence at 25 months after vaccination, while two with recurrent disease were subsequently treated with anti-PD-1 (anti-programmed cell death-1) therapy and experienced complete tumour regression, with expansion of the repertoire of neoantigen-specific T cells. These data provide a strong rationale for further development of this approach, alone and in combination with checkpoint blockade or other immunotherapies.
DOI: 10.1016/s0140-6736(12)60312-2
2012
Cited 1,965 times
Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study
High plasma HDL cholesterol is associated with reduced risk of myocardial infarction, but whether this association is causal is unclear. Exploiting the fact that genotypes are randomly assigned at meiosis, are independent of non-genetic confounding, and are unmodified by disease processes, mendelian randomisation can be used to test the hypothesis that the association of a plasma biomarker with disease is causal.We performed two mendelian randomisation analyses. First, we used as an instrument a single nucleotide polymorphism (SNP) in the endothelial lipase gene (LIPG Asn396Ser) and tested this SNP in 20 studies (20,913 myocardial infarction cases, 95,407 controls). Second, we used as an instrument a genetic score consisting of 14 common SNPs that exclusively associate with HDL cholesterol and tested this score in up to 12,482 cases of myocardial infarction and 41,331 controls. As a positive control, we also tested a genetic score of 13 common SNPs exclusively associated with LDL cholesterol.Carriers of the LIPG 396Ser allele (2·6% frequency) had higher HDL cholesterol (0·14 mmol/L higher, p=8×10(-13)) but similar levels of other lipid and non-lipid risk factors for myocardial infarction compared with non-carriers. This difference in HDL cholesterol is expected to decrease risk of myocardial infarction by 13% (odds ratio [OR] 0·87, 95% CI 0·84-0·91). However, we noted that the 396Ser allele was not associated with risk of myocardial infarction (OR 0·99, 95% CI 0·88-1·11, p=0·85). From observational epidemiology, an increase of 1 SD in HDL cholesterol was associated with reduced risk of myocardial infarction (OR 0·62, 95% CI 0·58-0·66). However, a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction (OR 0·93, 95% CI 0·68-1·26, p=0·63). For LDL cholesterol, the estimate from observational epidemiology (a 1 SD increase in LDL cholesterol associated with OR 1·54, 95% CI 1·45-1·63) was concordant with that from genetic score (OR 2·13, 95% CI 1·69-2·69, p=2×10(-10)).Some genetic mechanisms that raise plasma HDL cholesterol do not seem to lower risk of myocardial infarction. These data challenge the concept that raising of plasma HDL cholesterol will uniformly translate into reductions in risk of myocardial infarction.US National Institutes of Health, The Wellcome Trust, European Union, British Heart Foundation, and the German Federal Ministry of Education and Research.
DOI: 10.1038/nature01140
2002
Cited 1,884 times
Detecting recent positive selection in the human genome from haplotype structure
DOI: 10.1056/nejmoa1701719
2017
Cited 1,772 times
Clonal Hematopoiesis and Risk of Atherosclerotic Cardiovascular Disease
Clonal hematopoiesis of indeterminate potential (CHIP), which is defined as the presence of an expanded somatic blood-cell clone in persons without other hematologic abnormalities, is common among older persons and is associated with an increased risk of hematologic cancer. We previously found preliminary evidence for an association between CHIP and atherosclerotic cardiovascular disease, but the nature of this association was unclear.We used whole-exome sequencing to detect the presence of CHIP in peripheral-blood cells and associated such presence with coronary heart disease using samples from four case-control studies that together enrolled 4726 participants with coronary heart disease and 3529 controls. To assess causality, we perturbed the function of Tet2, the second most commonly mutated gene linked to clonal hematopoiesis, in the hematopoietic cells of atherosclerosis-prone mice.In nested case-control analyses from two prospective cohorts, carriers of CHIP had a risk of coronary heart disease that was 1.9 times as great as in noncarriers (95% confidence interval [CI], 1.4 to 2.7). In two retrospective case-control cohorts for the evaluation of early-onset myocardial infarction, participants with CHIP had a risk of myocardial infarction that was 4.0 times as great as in noncarriers (95% CI, 2.4 to 6.7). Mutations in DNMT3A, TET2, ASXL1, and JAK2 were each individually associated with coronary heart disease. CHIP carriers with these mutations also had increased coronary-artery calcification, a marker of coronary atherosclerosis burden. Hypercholesterolemia-prone mice that were engrafted with bone marrow obtained from homozygous or heterozygous Tet2 knockout mice had larger atherosclerotic lesions in the aortic root and aorta than did mice that had received control bone marrow. Analyses of macrophages from Tet2 knockout mice showed elevated expression of several chemokine and cytokine genes that contribute to atherosclerosis.The presence of CHIP in peripheral-blood cells was associated with nearly a doubling in the risk of coronary heart disease in humans and with accelerated atherosclerosis in mice. (Funded by the National Institutes of Health and others.).
DOI: 10.1016/j.cell.2017.05.046
2017
Cited 1,766 times
Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma
Liver cancer has the second highest worldwide cancer mortality rate and has limited therapeutic options. We analyzed 363 hepatocellular carcinoma (HCC) cases by whole-exome sequencing and DNA copy number analyses, and we analyzed 196 HCC cases by DNA methylation, RNA, miRNA, and proteomic expression also. DNA sequencing and mutation analysis identified significantly mutated genes, including LZTR1, EEF1A1, SF3B1, and SMARCA4. Significant alterations by mutation or downregulation by hypermethylation in genes likely to result in HCC metabolic reprogramming (ALB, APOB, and CPS1) were observed. Integrative molecular HCC subtyping incorporating unsupervised clustering of five data platforms identified three subtypes, one of which was associated with poorer prognosis in three HCC cohorts. Integrated analyses enabled development of a p53 target gene expression signature correlating with poor survival. Potential therapeutic targets for which inhibitors exist include WNT signaling, MDM4, MET, VEGFA, MCL1, IDH1, TERT, and immune checkpoint proteins CTLA-4, PD-1, and PD-L1.
DOI: 10.1038/ng1669
2005
Cited 1,647 times
Efficiency and power in genetic association studies
DOI: 10.1038/ng.2760
2013
Cited 1,613 times
Pan-cancer patterns of somatic copy number alteration
Rameen Beroukhim and colleagues analyzed somatic structural alterations in 12 tumor types. Whole-genome doubling was found in over a third of all cancers, associated with TP53 mutation. Fifteen new significantly mutated candidate driver genes were found associated with recurrently amplified or deleted regions. Determining how somatic copy number alterations (SCNAs) promote cancer is an important goal. We characterized SCNA patterns in 4,934 cancers from The Cancer Genome Atlas Pan-Cancer data set. Whole-genome doubling, observed in 37% of cancers, was associated with higher rates of every other type of SCNA, TP53 mutations, CCNE1 amplifications and alterations of the PPP2R complex. SCNAs that were internal to chromosomes tended to be shorter than telomere-bounded SCNAs, suggesting different mechanisms underlying their generation. Significantly recurrent focal SCNAs were observed in 140 regions, including 102 without known oncogene or tumor suppressor gene targets and 50 with significantly mutated genes. Amplified regions without known oncogenes were enriched for genes involved in epigenetic regulation. When levels of genomic disruption were accounted for, 7% of region pairs were anticorrelated, and these regions tended to encompass genes whose proteins physically interact, suggesting related functions. These results provide insights into mechanisms of generation and functional consequences of cancer-related SCNAs.
DOI: 10.1016/j.cell.2012.08.029
2012
Cited 1,613 times
Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
DOI: 10.1038/nature11011
2012
Cited 1,611 times
Patterns and rates of exonic de novo mutations in autism spectrum disorders
Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.
DOI: 10.1056/nejmoa073493
2007
Cited 1,590 times
Risk Alleles for Multiple Sclerosis Identified by a Genomewide Study
Multiple sclerosis has a clinically significant heritable component. We conducted a genomewide association study to identify alleles associated with the risk of multiple sclerosis.We used DNA microarray technology to identify common DNA sequence variants in 931 family trios (consisting of an affected child and both parents) and tested them for association. For replication, we genotyped another 609 family trios, 2322 case subjects, and 789 control subjects and used genotyping data from two external control data sets. A joint analysis of data from 12,360 subjects was performed to estimate the overall significance and effect size of associations between alleles and the risk of multiple sclerosis.A transmission disequilibrium test of 334,923 single-nucleotide polymorphisms (SNPs) in 931 family trios revealed 49 SNPs having an association with multiple sclerosis (P<1x10(-4)); of these SNPs, 38 were selected for the second-stage analysis. A comparison between the 931 case subjects from the family trios and 2431 control subjects identified an additional nonoverlapping 32 SNPs (P<0.001). An additional 40 SNPs with less stringent P values (<0.01) were also selected, for a total of 110 SNPs for the second-stage analysis. Of these SNPs, two within the interleukin-2 receptor alpha gene (IL2RA) were strongly associated with multiple sclerosis (P=2.96x10(-8)), as were a nonsynonymous SNP in the interleukin-7 receptor alpha gene (IL7RA) (P=2.94x10(-7)) and multiple SNPs in the HLA-DRA locus (P=8.94x10(-81)).Alleles of IL2RA and IL7RA and those in the HLA locus are identified as heritable risk factors for multiple sclerosis.
DOI: 10.1126/science.1219240
2012
Cited 1,570 times
Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes
A Deep Look Into Our Genes Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit ). Tennessen et al. (p. 64 , published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100 , published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health.
DOI: 10.1016/j.cell.2015.09.033
2015
Cited 1,467 times
Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer
Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3, and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity. Conversely, GATA3 mutations and high expression characterized luminal A IDC, suggesting differential modulation of ER activity in ILC and IDC. Proliferation and immune-related signatures determined three ILC transcriptional subtypes associated with survival differences. Mixed IDC/ILC cases were molecularly classified as ILC-like and IDC-like revealing no true hybrid features. This multidimensional molecular atlas sheds new light on the genetic bases of ILC and provides potential clinical options.
DOI: 10.1016/j.ccell.2017.07.007
2017
Cited 1,375 times
Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma
We performed integrated genomic, transcriptomic, and proteomic profiling of 150 pancreatic ductal adenocarcinoma (PDAC) specimens, including samples with characteristic low neoplastic cellularity. Deep whole-exome sequencing revealed recurrent somatic mutations in KRAS, TP53, CDKN2A, SMAD4, RNF43, ARID1A, TGFβR2, GNAS, RREB1, and PBRM1. KRAS wild-type tumors harbored alterations in other oncogenic drivers, including GNAS, BRAF, CTNNB1, and additional RAS pathway genes. A subset of tumors harbored multiple KRAS mutations, with some showing evidence of biallelic mutations. Protein profiling identified a favorable prognosis subset with low epithelial-mesenchymal transition and high MTOR pathway scores. Associations of non-coding RNAs with tumor-specific mRNA subtypes were also identified. Our integrated multi-platform analysis reveals a complex molecular landscape of PDAC and provides a roadmap for precision medicine.
DOI: 10.1038/ng.2279
2012
Cited 1,327 times
Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer
Prostate cancer is the second most common cancer in men worldwide and causes over 250,000 deaths each year. Overtreatment of indolent disease also results in significant morbidity. Common genetic alterations in prostate cancer include losses of NKX3.1 (8p21) and PTEN (10q23), gains of AR (the androgen receptor gene) and fusion of ETS family transcription factor genes with androgen-responsive promoters. Recurrent somatic base-pair substitutions are believed to be less contributory in prostate tumorigenesis but have not been systematically analyzed in large cohorts. Here, we sequenced the exomes of 112 prostate tumor and normal tissue pairs. New recurrent mutations were identified in multiple genes, including MED12 and FOXA1. SPOP was the most frequently mutated gene, with mutations involving the SPOP substrate-binding cleft in 6-15% of tumors across multiple independent cohorts. Prostate cancers with mutant SPOP lacked ETS family gene rearrangements and showed a distinct pattern of genomic alterations. Thus, SPOP mutations may define a new molecular subtype of prostate cancer.
DOI: 10.1038/nature12975
2014
Cited 1,313 times
A polygenic burden of rare disruptive mutations in schizophrenia
Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.
DOI: 10.1038/nature09837
2011
Cited 1,309 times
Initial genome sequencing and analysis of multiple myeloma
Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signalling was indicated by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
DOI: 10.1038/nbt.1523
2009
Cited 1,278 times
Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
DOI: 10.1016/j.cell.2013.01.019
2013
Cited 1,231 times
Evolution and Impact of Subclonal Mutations in Chronic Lymphocytic Leukemia
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12, and del(13q)) or subclonal (e.g., SF3B1 and TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two time points. Ten of twelve CLL cases treated with chemotherapy (but only one of six without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1 and TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcomes.
DOI: 10.1101/201178
2017
Cited 1,202 times
Scaling accurate genetic variant discovery to tens of thousands of samples
Abstract Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
DOI: 10.1016/j.cell.2013.03.021
2013
Cited 1,141 times
Punctuated Evolution of Prostate Cancer Genomes
<h2>Summary</h2> The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term "chromoplexy," frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis.
DOI: 10.1056/nejmoa0804525
2008
Cited 1,135 times
Gene Expression in Fixed Tissues and Outcome in Hepatocellular Carcinoma
It is a challenge to identify patients who, after undergoing potentially curative treatment for hepatocellular carcinoma, are at greatest risk for recurrence. Such high-risk patients could receive novel interventional measures. An obstacle to the development of genome-based predictors of outcome in patients with hepatocellular carcinoma has been the lack of a means to carry out genomewide expression profiling of fixed, as opposed to frozen, tissue.We aimed to demonstrate the feasibility of gene-expression profiling of more than 6000 human genes in formalin-fixed, paraffin-embedded tissues. We applied the method to tissues from 307 patients with hepatocellular carcinoma, from four series of patients, to discover and validate a gene-expression signature associated with survival.The expression-profiling method for formalin-fixed, paraffin-embedded tissue was highly effective: samples from 90% of the patients yielded data of high quality, including samples that had been archived for more than 24 years. Gene-expression profiles of tumor tissue failed to yield a significant association with survival. In contrast, profiles of the surrounding nontumoral liver tissue were highly correlated with survival in a training set of tissue samples from 82 Japanese patients, and the signature was validated in tissues from an independent group of 225 patients from the United States and Europe (P=0.04).We have demonstrated the feasibility of genomewide expression profiling of formalin-fixed, paraffin-embedded tissues and have shown that a reproducible gene-expression signature correlated with survival is present in liver tissue adjacent to the tumor in patients with hepatocellular carcinoma.
DOI: 10.1038/ng.209
2008
Cited 1,135 times
Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder
Pamela Sklar and colleagues report a genome-wide association study of bipolar disorder and identify variants in the genes encoding ankyrin-3 and the alpha-1C subunit of the L-type voltage-gated calcium channel as increasing risk. To identify susceptibility loci for bipolar disorder, we tested 1.8 million variants in 4,387 cases and 6,209 controls and identified a region of strong association (rs10994336, P = 9.1 × 10−9) in ANK3 (ankyrin G). We also found further support for the previously reported CACNA1C (alpha 1C subunit of the L-type voltage-gated calcium channel; combined P = 7.0 × 10−8, rs1006737). Our results suggest that ion channelopathies may be involved in the pathogenesis of bipolar disorder.
DOI: 10.1038/nature09744
2011
Cited 1,131 times
The genomic complexity of primary human prostate cancer
Prostate cancer is the second most common cause of male cancer deaths in the United States. However, the full range of prostate cancer genomic alterations is incompletely characterized. Here we present the complete sequence of seven primary human prostate cancers and their paired normal counterparts. Several tumours contained complex chains of balanced (that is, 'copy-neutral') rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumours lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumours contained rearrangements that disrupted CADM2, and four harboured events disrupting either PTEN (unbalanced events), a prostate tumour suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies and engage prostate tumorigenic mechanisms.
DOI: 10.1038/nature11154
2012
Cited 1,087 times
Sequence analysis of mutations and translocations across breast cancer subtypes
Breast carcinoma is the leading cause of cancer-related mortality in women worldwide, with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis and responses to available therapy. Recurrent somatic alterations in breast cancer have been described, including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration. Previous DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA, TP53, AKT1, GATA3 and MAP3K1, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3-AKT3 fusion enriched in triple-negative breast cancer lacking oestrogen and progesterone receptors and ERBB2 expression. The MAGI3-AKT3 fusion leads to constitutive activation of AKT kinase, which is abolished by treatment with an ATP-competitive AKT small-molecule inhibitor.
DOI: 10.1038/nature06358
2007
Cited 1,034 times
Characterizing the cancer genome in lung adenocarcinoma
Somatic alterations in cellular DNA underlie almost all human cancers. The prospect of targeted therapies and the development of high-resolution, genome-wide approaches are now spurring systematic efforts to characterize cancer genomes. Here we report a large-scale project to characterize copy-number alterations in primary lung adenocarcinomas. By analysis of a large collection of tumours (n = 371) using dense single nucleotide polymorphism arrays, we identify a total of 57 significantly recurrent events. We find that 26 of 39 autosomal chromosome arms show consistent large-scale copy-number gain or loss, of which only a handful have been linked to a specific gene. We also identify 31 recurrent focal events, including 24 amplifications and 7 homozygous deletions. Only six of these focal events are currently associated with known mutations in lung carcinomas. The most common event, amplification of chromosome 14q13.3, is found in approximately 12% of samples. On the basis of genomic and functional analyses, we identify NKX2-1 (NK2 homeobox 1, also called TITF1), which lies in the minimal 14q13.3 amplification interval and encodes a lineage-specific transcription factor, as a novel candidate proto-oncogene involved in a significant fraction of lung adenocarcinomas. More generally, our results indicate that many of the genes that are involved in lung adenocarcinoma remain to be discovered.
DOI: 10.1038/nrg2841
2010
Cited 1,033 times
Advances in understanding cancer genomes through second-generation sequencing
DOI: 10.1056/nejmoa1109016
2011
Cited 1,010 times
<i>SF3B1</i>and Other Novel Cancer Genes in Chronic Lymphocytic Leukemia
The somatic genetic basis of chronic lymphocytic leukemia, a common and clinically heterogeneous leukemia occurring in adults, remains poorly understood.We obtained DNA samples from leukemia cells in 91 patients with chronic lymphocytic leukemia and performed massively parallel sequencing of 88 whole exomes and whole genomes, together with sequencing of matched germline DNA, to characterize the spectrum of somatic mutations in this disease.Nine genes that are mutated at significant frequencies were identified, including four with established roles in chronic lymphocytic leukemia (TP53 in 15% of patients, ATM in 9%, MYD88 in 10%, and NOTCH1 in 4%) and five with unestablished roles (SF3B1, ZMYM3, MAPK1, FBXW7, and DDX3X). SF3B1, which functions at the catalytic core of the spliceosome, was the second most frequently mutated gene (with mutations occurring in 15% of patients). SF3B1 mutations occurred primarily in tumors with deletions in chromosome 11q, which are associated with a poor prognosis in patients with chronic lymphocytic leukemia. We further discovered that tumor samples with mutations in SF3B1 had alterations in pre-messenger RNA (mRNA) splicing.Our study defines the landscape of somatic mutations in chronic lymphocytic leukemia and highlights pre-mRNA splicing as a critical cellular process contributing to chronic lymphocytic leukemia.
DOI: 10.1038/ng.327
2009
Cited 979 times
Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants
We conducted a genome-wide association study testing single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) for association with early-onset myocardial infarction in 2,967 cases and 3,075 controls. We carried out replication in an independent sample with an effective sample size of up to 19,492. SNPs at nine loci reached genome-wide significance: three are newly identified (21q22 near MRPS6-SLC5A3-KCNE2, 6p24 in PHACTR1 and 2q33 in WDR12) and six replicated prior observations (9p21, 1p13 near CELSR2-PSRC1-SORT1, 10q11 near CXCL12, 1q41 in MIA3, 19p13 near LDLR and 1p32 near PCSK9). We tested 554 common copy number polymorphisms (>1% allele frequency) and none met the pre-specified threshold for replication (P < 10(-3)). We identified 8,065 rare CNVs but did not detect a greater CNV burden in cases compared to controls, in genes compared to the genome as a whole, or at any individual locus. SNPs at nine loci were reproducibly associated with myocardial infarction, but tests of common and rare CNVs failed to identify additional associations with myocardial infarction risk.
DOI: 10.1038/ng.2529
2013
Cited 968 times
The genetic landscape of high-risk neuroblastoma
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.
DOI: 10.1158/0008-5472.can-09-1089
2009
Cited 965 times
Integrative Transcriptome Analysis Reveals Common Molecular Subclasses of Human Hepatocellular Carcinoma
Abstract Hepatocellular carcinoma (HCC) is a highly heterogeneous disease, and prior attempts to develop genomic-based classification for HCC have yielded highly divergent results, indicating difficulty in identifying unified molecular anatomy. We performed a meta-analysis of gene expression profiles in data sets from eight independent patient cohorts across the world. In addition, aiming to establish the real world applicability of a classification system, we profiled 118 formalin-fixed, paraffin-embedded tissues from an additional patient cohort. A total of 603 patients were analyzed, representing the major etiologies of HCC (hepatitis B and C) collected from Western and Eastern countries. We observed three robust HCC subclasses (termed S1, S2, and S3), each correlated with clinical parameters such as tumor size, extent of cellular differentiation, and serum α-fetoprotein levels. An analysis of the components of the signatures indicated that S1 reflected aberrant activation of the WNT signaling pathway, S2 was characterized by proliferation as well as MYC and AKT activation, and S3 was associated with hepatocyte differentiation. Functional studies indicated that the WNT pathway activation signature characteristic of S1 tumors was not simply the result of β-catenin mutation but rather was the result of transforming growth factor-β activation, thus representing a new mechanism of WNT pathway activation in HCC. These experiments establish the first consensus classification framework for HCC based on gene expression profiles and highlight the power of integrating multiple data sets to define a robust molecular taxonomy of the disease. [Cancer Res 2009;69(18):7385–92]
DOI: 10.1038/s41586-018-0792-9
2018
Cited 948 times
Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial
Neoantigens, which are derived from tumour-specific protein-coding mutations, are exempt from central tolerance, can generate robust immune responses1,2 and can function as bona fide antigens that facilitate tumour rejection3. Here we demonstrate that a strategy that uses multi-epitope, personalized neoantigen vaccination, which has previously been tested in patients with high-risk melanoma4-6, is feasible for tumours such as glioblastoma, which typically have a relatively low mutation load1,7 and an immunologically 'cold' tumour microenvironment8. We used personalized neoantigen-targeting vaccines to immunize patients newly diagnosed with glioblastoma following surgical resection and conventional radiotherapy in a phase I/Ib study. Patients who did not receive dexamethasone-a highly potent corticosteroid that is frequently prescribed to treat cerebral oedema in patients with glioblastoma-generated circulating polyfunctional neoantigen-specific CD4+ and CD8+ T cell responses that were enriched in a memory phenotype and showed an increase in the number of tumour-infiltrating T cells. Using single-cell T cell receptor analysis, we provide evidence that neoantigen-specific T cells from the peripheral blood can migrate into an intracranial glioblastoma tumour. Neoantigen-targeting vaccines thus have the potential to favourably alter the immune milieu of glioblastoma.
DOI: 10.1038/ng1975
2007
Cited 942 times
High-throughput oncogene mutation profiling in human cancer
DOI: 10.1038/ng.3050
2014
Cited 935 times
A framework for the interpretation of de novo mutation in human disease
Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation. We applied this framework to de novo mutations collected from 1,078 ASD family trios, and, whereas we affirmed a significant role for loss-of-function mutations, we found no excess of de novo loss-of-function mutations in cases with IQ above 100, suggesting that the role of de novo mutations in ASDs might reside in fundamental neurodevelopmental processes. We also used our model to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases.
DOI: 10.1038/ng.238
2008
Cited 885 times
Integrated detection and population-genetic analysis of SNPs and copy number variation
DOI: 10.1038/nature15395
2015
Cited 857 times
Mutations driving CLL and their evolution in progression and relapse
Which genetic alterations drive tumorigenesis and how they evolve over the course of disease and therapy are central questions in cancer biology. Here we identify 44 recurrently mutated genes and 11 recurrent somatic copy number variations through whole-exome sequencing of 538 chronic lymphocytic leukaemia (CLL) and matched germline DNA samples, 278 of which were collected in a prospective clinical trial. These include previously unrecognized putative cancer drivers (RPS15, IKZF3), and collectively identify RNA processing and export, MYC activity, and MAPK signalling as central pathways involved in CLL. Clonality analysis of this large data set further enabled reconstruction of temporal relationships between driver events. Direct comparison between matched pre-treatment and relapse samples from 59 patients demonstrated highly frequent clonal evolution. Thus, large sequencing data sets of clinically informative samples enable the discovery of novel genes associated with cancer, the network of relationships between the driver events, and their impact on disease relapse and clinical outcome.
DOI: 10.1038/nature11690
2012
Cited 843 times
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
DOI: 10.1158/2159-8290.cd-15-0369
2015
Cited 835 times
Genomic Characterization of Brain Metastases Reveals Branched Evolution and Potential Therapeutic Targets
Brain metastases are associated with a dismal prognosis. Whether brain metastases harbor distinct genetic alterations beyond those observed in primary tumors is unknown. We performed whole-exome sequencing of 86 matched brain metastases, primary tumors, and normal tissue. In all clonally related cancer samples, we observed branched evolution, where all metastatic and primary sites shared a common ancestor yet continued to evolve independently. In 53% of cases, we found potentially clinically informative alterations in the brain metastases not detected in the matched primary-tumor sample. In contrast, spatially and temporally separated brain metastasis sites were genetically homogenous. Distal extracranial and regional lymph node metastases were highly divergent from brain metastases. We detected alterations associated with sensitivity to PI3K/AKT/mTOR, CDK, and HER2/EGFR inhibitors in the brain metastases. Genomic analysis of brain metastases provides an opportunity to identify potentially clinically informative alterations not detected in clinically sampled primary tumors, regional lymph nodes, or extracranial metastases.Decisions for individualized therapies in patients with brain metastasis are often made from primary-tumor biopsies. We demonstrate that clinically actionable alterations present in brain metastases are frequently not detected in primary biopsies, suggesting that sequencing of primary biopsies alone may miss a substantial number of opportunities for targeted therapy.
DOI: 10.1158/2159-8290.cd-13-0617
2014
Cited 795 times
The Genetic Landscape of Clinical Resistance to RAF Inhibition in Metastatic Melanoma
Abstract Most patients with BRAFV600-mutant metastatic melanoma develop resistance to selective RAF kinase inhibitors. The spectrum of clinical genetic resistance mechanisms to RAF inhibitors and options for salvage therapy are incompletely understood. We performed whole-exome sequencing on formalin-fixed, paraffin-embedded tumors from 45 patients with BRAFV600-mutant metastatic melanoma who received vemurafenib or dabrafenib monotherapy. Genetic alterations in known or putative RAF inhibitor resistance genes were observed in 23 of 45 patients (51%). Besides previously characterized alterations, we discovered a “long tail” of new mitogen-activated protein kinase (MAPK) pathway alterations (MAP2K2, MITF) that confer RAF inhibitor resistance. In three cases, multiple resistance gene alterations were observed within the same tumor biopsy. Overall, RAF inhibitor therapy leads to diverse clinical genetic resistance mechanisms, mostly involving MAPK pathway reactivation. Novel therapeutic combinations may be needed to achieve durable clinical control of BRAFV600-mutant melanoma. Integrating clinical genomics with preclinical screens may model subsequent resistance studies. Significance: The use of RAF inhibitors for BRAFV600-mutant metastatic melanoma improves patient outcomes, but most patients demonstrate early or acquired resistance to this targeted therapy. We reveal the genetic landscape of clinical resistance mechanisms to RAF inhibitors from patients using whole-exome sequencing, and experimentally assess new observed mechanisms to define potential subsequent treatment strategies. Cancer Discov; 4(1); 94–109. ©2013 AACR. See related commentary by Solit and Rosen, p. 27 This article is highlighted in the In This Issue feature, p. 1
DOI: 10.1038/nn.3786
2014
Cited 784 times
Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci
We used a collection of 708 prospectively collected autopsied brains to assess the methylation state of the brain's DNA in relation to Alzheimer's disease (AD). We found that the level of methylation at 71 of the 415,848 interrogated CpGs was significantly associated with the burden of AD pathology, including CpGs in the ABCA7 and BIN1 regions, which harbor known AD susceptibility variants. We validated 11 of the differentially methylated regions in an independent set of 117 subjects. Furthermore, we functionally validated these CpG associations and identified the nearby genes whose RNA expression was altered in AD: ANK1, CDH23, DIP2A, RHBDF2, RPL13, SERPINF1 and SERPINF2. Our analyses suggest that these DNA methylation changes may have a role in the onset of AD given that we observed them in presymptomatic subjects and that six of the validated genes connect to a known AD susceptibility gene network.
DOI: 10.1002/0471142905.hg0212s60
2009
Cited 764 times
SNP Genotyping Using the Sequenom MassARRAY iPLEX Platform
The method for SNP genotyping described in this unit is based on the commercially available Sequenom MassARRAY platform. The assay consists of an initial locus-specific PCR reaction, followed by single base extension using mass-modified dideoxynucleotide terminators of an oligonucleotide primer which anneals immediately upstream of the polymorphic site of interest. Using MALDI-TOF mass spectrometry, the distinct mass of the extended primer identifies the SNP allele.
DOI: 10.1038/ng1333
2004
Cited 762 times
Assessing the impact of population stratification on genetic association studies
Population stratification refers to differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than association of genes with disease. It has been proposed that false positive associations due to stratification can be controlled by genotyping a few dozen unlinked genetic markers. To assess stratification empirically, we analyzed data from 11 case-control and case-cohort association studies. We did not detect statistically significant evidence for stratification but did observe that assessments based on a few dozen markers lack power to rule out moderate levels of stratification that could cause false positive associations in studies designed to detect modest genetic risk factors. After increasing the number of markers and samples in a case-cohort study (the design most immune to stratification), we found that stratification was in fact present. Our results suggest that modest amounts of stratification can exist even in well designed studies.
DOI: 10.1016/j.cell.2017.10.014
2017
Cited 754 times
Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas
Sarcomas are a broad family of mesenchymal malignancies exhibiting remarkable histologic diversity. We describe the multi-platform molecular landscape of 206 adult soft tissue sarcomas representing 6 major types. Along with novel insights into the biology of individual sarcoma types, we report three overarching findings: (1) unlike most epithelial malignancies, these sarcomas (excepting synovial sarcoma) are characterized predominantly by copy-number changes, with low mutational loads and only a few genes (TP53, ATRX, RB1) highly recurrently mutated across sarcoma types; (2) within sarcoma types, genomic and regulomic diversity of driver pathways defines molecular subtypes associated with patient outcome; and (3) the immune microenvironment, inferred from DNA methylation and mRNA profiles, associates with outcome and may inform clinical trials of immune checkpoint inhibitors. Overall, this large-scale analysis reveals previously unappreciated sarcoma-type-specific changes in copy number, methylation, RNA, and protein, providing insights into refining sarcoma therapy and relationships to other cancer types.
DOI: 10.1038/ng.237
2008
Cited 746 times
Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.
DOI: 10.1016/j.jacc.2016.03.520
2016
Cited 733 times
Diagnostic Yield and Clinical Utility of Sequencing Familial Hypercholesterolemia Genes in Patients With Severe Hypercholesterolemia
Approximately 7% of American adults have severe hypercholesterolemia (untreated low-density lipoprotein [LDL] cholesterol ≥190 mg/dl), which may be due to familial hypercholesterolemia (FH). Lifelong LDL cholesterol elevations in FH mutation carriers may confer coronary artery disease (CAD) risk beyond that captured by a single LDL cholesterol measurement.This study assessed the prevalence of an FH mutation among those with severe hypercholesterolemia and determined whether CAD risk varies according to mutation status beyond the observed LDL cholesterol level.Three genes causative for FH (LDLR, APOB, and PCSK9) were sequenced in 26,025 participants from 7 case-control studies (5,540 CAD case subjects, 8,577 CAD-free control subjects) and 5 prospective cohort studies (11,908 participants). FH mutations included loss-of-function variants in LDLR, missense mutations in LDLR predicted to be damaging, and variants linked to FH in ClinVar, a clinical genetics database.Among 20,485 CAD-free control and prospective cohort participants, 1,386 (6.7%) had LDL cholesterol ≥190 mg/dl; of these, only 24 (1.7%) carried an FH mutation. Within any stratum of observed LDL cholesterol, risk of CAD was higher among FH mutation carriers than noncarriers. Compared with a reference group with LDL cholesterol <130 mg/dl and no mutation, participants with LDL cholesterol ≥190 mg/dl and no FH mutation had a 6-fold higher risk for CAD (odds ratio: 6.0; 95% confidence interval: 5.2 to 6.9), whereas those with both LDL cholesterol ≥190 mg/dl and an FH mutation demonstrated a 22-fold increased risk (odds ratio: 22.3; 95% confidence interval: 10.7 to 53.2). In an analysis of participants with serial lipid measurements over many years, FH mutation carriers had higher cumulative exposure to LDL cholesterol than noncarriers.Among participants with LDL cholesterol ≥190 mg/dl, gene sequencing identified an FH mutation in <2%. However, for any observed LDL cholesterol, FH mutation carriers had substantially increased risk for CAD.
DOI: 10.1038/ng.271
2008
Cited 720 times
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population
Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
DOI: 10.1038/nature12881
2013
Cited 715 times
Landscape of genomic alterations in cervical carcinomas
Whole-exome sequencing and analysis of 115 cervical carcinoma–normal paired samples, in addition to transcriptome and whole-genome sequencing for a subset of these tumours, reveal novel genes mutated at significant levels within this cohort and provide evidence that HPV integration is a common mechanism for target gene overexpression; results also compare mutational landscapes between squamous cell carcinomas and adenocarcinomas. To provide an overview of the genomic aberrations that contribute to cervical cancer these authors performed whole-exome sequencing and analysis of 115 cervical cancer–normal pairs, transcriptome sequences of 79 cervical carcinomas and whole-genomes from 14 cervical cancer–normal pairs. Analyses identify MAPK1, HLA-B and ELF3 as novel significantly mutated genes and provide evidence that human papilloma virus integration is a common mechanism for target gene overexpression in cervical cancer. The results also provide a comparison of the mutational landscapes of squamous cell carcinomas and adenocarcinomas. Cervical cancer is responsible for 10–15% of cancer-related deaths in women worldwide1,2. The aetiological role of infection with high-risk human papilloma viruses (HPVs) in cervical carcinomas is well established3. Previous studies have also implicated somatic mutations in PIK3CA, PTEN, TP53, STK11 and KRAS4,5,6,7 as well as several copy-number alterations in the pathogenesis of cervical carcinomas8,9. Here we report whole-exome sequencing analysis of 115 cervical carcinoma–normal paired samples, transcriptome sequencing of 79 cases and whole-genome sequencing of 14 tumour–normal pairs. Previously unknown somatic mutations in 79 primary squamous cell carcinomas include recurrent E322K substitutions in the MAPK1 gene (8%), inactivating mutations in the HLA-B gene (9%), and mutations in EP300 (16%), FBXW7 (15%), NFE2L2 (4%), TP53 (5%) and ERBB2 (6%). We also observe somatic ELF3 (13%) and CBFB (8%) mutations in 24 adenocarcinomas. Squamous cell carcinomas have higher frequencies of somatic nucleotide substitutions occurring at cytosines preceded by thymines (Tp*C sites) than adenocarcinomas. Gene expression levels at HPV integration sites were statistically significantly higher in tumours with HPV integration compared with expression of the same genes in tumours without viral integration at the same site. These data demonstrate several recurrent genomic alterations in cervical carcinomas that suggest new strategies to combat this disease.
DOI: 10.1038/ng.952
2011
Cited 706 times
Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease
More than 1,000 susceptibility loci have been identified through genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings have not yet been defined. Here we used pooled next-generation sequencing to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls. Through follow-up genotyping of 70 rare and low-frequency protein-altering variants in nine independent case-control series (16,054 Crohn's disease cases, 12,153 ulcerative colitis cases and 17,575 healthy controls), we identified four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association with a protective splice variant in CARD9 (P < 1 × 10(-16), odds ratio ≈ 0.29) and additional associations with coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by identifying new, rare and probably functional variants that could aid functional experiments and predictive models.
DOI: 10.1016/j.celrep.2016.03.075
2016
Cited 693 times
Genomic Correlates of Immune-Cell Infiltrates in Colorectal Carcinoma
Large-scale genomic characterization of tumors from prospective cohort studies may yield new insights into cancer pathogenesis. We performed whole-exome sequencing of 619 incident colorectal cancers (CRCs) and integrated the results with tumor immunity, pathology, and survival data. We identified recurrently mutated genes in CRC, such as BCL9L, RBM10, CTCF, and KLF5, that were not previously appreciated in this disease. Furthermore, we investigated the genomic correlates of immune-cell infiltration and found that higher neoantigen load was positively associated with overall lymphocytic infiltration, tumor-infiltrating lymphocytes (TILs), memory T cells, and CRC-specific survival. The association with TILs was evident even within microsatellite-stable tumors. We also found positive selection of mutations in HLA genes and other components of the antigen-processing machinery in TIL-rich tumors. These results may inform immunotherapeutic approaches in CRC. More generally, this study demonstrates a framework for future integrative molecular epidemiology research in colorectal and other malignancies.
DOI: 10.1038/nature11329
2012
Cited 676 times
Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations
Medulloblastoma is the most common brain tumour in children; using exome sequencing of tumour samples the authors show that these cancers have low mutation rates and identify 12 significantly mutated genes, among them the gene encoding RNA helicase DDX3X. Medulloblastoma is the most common malignant brain tumour in children. Four papers published in the 2 August 2012 issue of Nature use whole-genome and other sequencing techniques to produce a detailed picture of the genetics and genomics of this condition. Notable findings include the identification of recurrent mutations in genes not previously implicated in medulloblastoma, with significant genetic differences associated with the four biologically distinct subgroups and clinical outcomes in each. Potential avenues for therapy are suggested by the identification of targetable somatic copy-number alterations, including recurrent events targeting TGFβ signalling in Group 3, and NF-κB signalling in Group 4 medulloblastomas. Medulloblastomas are the most common malignant brain tumours in children1. Identifying and understanding the genetic events that drive these tumours is critical for the development of more effective diagnostic, prognostic and therapeutic strategies. Recently, our group and others described distinct molecular subtypes of medulloblastoma on the basis of transcriptional and copy number profiles2,3,4,5. Here we use whole-exome hybrid capture and deep sequencing to identify somatic mutations across the coding regions of 92 primary medulloblastoma/normal pairs. Overall, medulloblastomas have low mutation rates consistent with other paediatric tumours, with a median of 0.35 non-silent mutations per megabase. We identified twelve genes mutated at statistically significant frequencies, including previously known mutated genes in medulloblastoma such as CTNNB1, PTCH1, MLL2, SMARCA4 and TP53. Recurrent somatic mutations were newly identified in an RNA helicase gene, DDX3X, often concurrent with CTNNB1 mutations, and in the nuclear co-repressor (N-CoR) complex genes GPS2, BCOR and LDB1. We show that mutant DDX3X potentiates transactivation of a TCF promoter and enhances cell viability in combination with mutant, but not wild-type, β-catenin. Together, our study reveals the alteration of WNT, hedgehog, histone methyltransferase and now N-CoR pathways across medulloblastomas and within specific subtypes of this disease, and nominates the RNA helicase DDX3X as a component of pathogenic β-catenin signalling in medulloblastoma.
DOI: 10.1038/nature11071
2012
Cited 676 times
Melanoma genome sequencing reveals frequent PREX2 mutations
Melanoma is notable for its metastatic propensity, lethality in the advanced setting and association with ultraviolet exposure early in life. To obtain a comprehensive genomic view of melanoma in humans, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-ultraviolet-exposed hairless skin of the extremities (3 and 14 per megabase (Mb) of genome), intermediate in those originating from hair-bearing skin of the trunk (5-55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 (phosphatidylinositol-3,4,5-trisphosphate-dependent Rac exchange factor 2)--a PTEN-interacting protein and negative regulator of PTEN in breast cancer--as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumour formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumours revealed genomic evidence of ultraviolet pathogenesis and discovered a new recurrently mutated gene in melanoma.
DOI: 10.1038/ng1696
2005
Cited 671 times
Common deletion polymorphisms in the human genome
DOI: 10.1038/ng.2591
2013
Cited 664 times
Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity
Adam Bass, Gad Getz and colleagues report whole-exome sequencing of 149 esophageal adenocarcinomas (EACs) and whole-genome sequencing of 15 EACs. They identify a mutational signature defined by a high prevalence of A>C transversions, as well as 26 genes mutated at high frequency in EACs. The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a 5-year survival rate of ∼15%, the identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole-exome sequencing of 149 EAC tumor-normal pairs, 15 of which have also been subjected to whole-genome sequencing. We identify a mutational signature defined by a high prevalence of A>C transversions at AA dinucleotides. Statistical analysis of exome data identified 26 significantly mutated genes. Of these genes, five (TP53, CDKN2A, SMAD4, ARID1A and PIK3CA) have previously been implicated in EAC. The new significantly mutated genes include chromatin-modifying factors and candidate contributors SPG20, TLR4, ELMO1 and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 identifies increased cellular invasion. Therefore, we suggest the potential activation of the RAC1 pathway as a contributor to EAC tumorigenesis.
DOI: 10.1038/sj.mp.4002151
2008
Cited 663 times
Whole-genome association study of bipolar disorder
We performed a genome-wide association scan in 1461 patients with bipolar (BP) 1 disorder, 2008 controls drawn from the Systematic Treatment Enhancement Program for Bipolar Disorder and the University College London sample collections with successful genotyping for 372 193 single nucleotide polymorphisms (SNPs). Our strongest single SNP results are found in myosin5B (MYO5B; P=1.66 × 10−7) and tetraspanin-8 (TSPAN8; P=6.11 × 10−7). Haplotype analysis further supported single SNP results highlighting MYO5B, TSPAN8 and the epidermal growth factor receptor (MYO5B; P=2.04 × 10−8, TSPAN8; P=7.57 × 10−7 and EGFR; P=8.36 × 10−8). For replication, we genotyped 304 SNPs in family-based NIMH samples (n=409 trios) and University of Edinburgh case–control samples (n=365 cases, 351 controls) that did not provide independent replication after correction for multiple testing. A comparison of our strongest associations with the genome-wide scan of 1868 patients with BP disorder and 2938 controls who completed the scan as part of the Wellcome Trust Case–Control Consortium indicates concordant signals for SNPs within the voltage-dependent calcium channel, L-type, alpha 1C subunit (CACNA1C) gene. Given the heritability of BP disorder, the lack of agreement between studies emphasizes that susceptibility alleles are likely to be modest in effect size and require even larger samples for detection.
DOI: 10.1016/j.ccell.2017.07.003
2017
Cited 651 times
Integrative Analysis Identifies Four Molecular and Clinical Subsets in Uveal Melanoma
<h2>Summary</h2> Comprehensive multiplatform analysis of 80 uveal melanomas (UM) identifies four molecularly distinct, clinically relevant subtypes: two associated with poor-prognosis monosomy 3 (M3) and two with better-prognosis disomy 3 (D3). We show that <i>BAP1</i> loss follows M3 occurrence and correlates with a global DNA methylation state that is distinct from D3-UM. Poor-prognosis M3-UM divide into subsets with divergent genomic aberrations, transcriptional features, and clinical outcomes. We report change-of-function <i>SRSF2</i> mutations. Within D3-UM, <i>EIF1AX</i>- and <i>SRSF2</i>/<i>SF3B1</i>-mutant tumors have distinct somatic copy number alterations and DNA methylation profiles, providing insight into the biology of these low- versus intermediate-risk clinical mutation subtypes.
DOI: 10.1056/nejmoa1002926
2010
Cited 644 times
Exome Sequencing,<i>ANGPTL3</i>Mutations, and Familial Combined Hypolipidemia
We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).
DOI: 10.1038/s41586-020-2287-8
2020
Cited 642 times
A structural variation reference for medical and population genetics
Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.
DOI: 10.1101/gr.3709305
2005
Cited 632 times
Calibrating a coalescent simulation of human genome sequence variation
Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a “null” distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.
DOI: 10.1073/pnas.1019276108
2011
Cited 620 times
Demographic history and rare allele sharing among human populations
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
DOI: 10.1038/ng.2329
2012
Cited 615 times
De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly
De novo somatic mutations in focal areas are well documented in diseases such as neoplasia but are rarely reported in malformation of the developing brain. Hemimegalencephaly (HME) is characterized by overgrowth of either one of the two cerebral hemispheres. The molecular etiology of HME remains a mystery. The intractable epilepsy that is associated with HME can be relieved by the surgical treatment hemispherectomy, allowing sampling of diseased tissue. Exome sequencing and mass spectrometry analysis in paired brain-blood samples from individuals with HME (n = 20 cases) identified de novo somatic mutations in 30% of affected individuals in the PIK3CA, AKT3 and MTOR genes. A recurrent PIK3CA c.1633G>A mutation was found in four separate cases. Identified mutations were present in 8-40% of sequenced alleles in various brain regions and were associated with increased neuronal S6 protein phosphorylation in the brains of affected individuals, indicating aberrant activation of mammalian target of rapamycin (mTOR) signaling. Thus HME is probably a genetically mosaic disease caused by gain of function in phosphatidylinositol 3-kinase (PI3K)-AKT3-mTOR signaling.
DOI: 10.1101/531210
2019
Cited 586 times
The mutational constraint spectrum quantified from variation in 141,456 humans
Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
DOI: 10.1038/nature13917
2014
Cited 576 times
Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
DOI: 10.1186/gb-2011-12-1-r1
2011
Cited 557 times
A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries
Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol.
DOI: 10.1038/sj.mp.4001058
2002
Cited 551 times
Family-based association study of 76 candidate genes in bipolar disorder: BDNF is a potential risk locus
Identification of the genetic bases for bipolar disorder remains a challenge for the understanding of this disease. Association between 76 candidate genes and bipolar disorder was tested by genotyping 90 single-nucleotide polymorphisms (SNPs) in these genes in 136 parent-proband trios. In this preliminary analysis, SNPs in two genes, brain-derived neurotrophic factor (BDNF) and the alpha subunit of the voltage-dependent calcium channel were associated with bipolar disorder at the P<0.05 level. In view of the large number of hypotheses tested, the two nominally positive associations were then tested in independent populations of bipolar patients and only BDNF remains a potential risk gene. In the replication samples, excess transmission of the valine allele of amino acid 66 of BDNF was observed in the direction of the original result in an additional sample of 334 parent-proband trios (T/U=108/87, P=0.066). Resequencing of 29 kb surrounding the BDNF gene identified 44 additional SNPs. Genotyping eight common SNPs identified three additional markers transmitted to bipolar probands at the P < 0.05 level. Strong LD was observed across this region and all adjacent pairwise haplotypes showed excess transmission to the bipolar proband. Analysis of these haplotypes using TRANSMIT revealed a global P value of 0.03. A single haplotype was identified that is shared by both the original dataset and the replication sample that is uniquely marked by both the rare A allele of the original SNP and a novel allele 11.5 kb 3'. Therefore, this study of 76 candidate genes has identified BDNF as a potential risk allele that will require additional study to confirm.
DOI: 10.1016/j.ccell.2017.01.001
2017
Cited 531 times
Comprehensive Molecular Characterization of Pheochromocytoma and Paraganglioma
We report a comprehensive molecular characterization of pheochromocytomas and paragangliomas (PCCs/PGLs), a rare tumor type. Multi-platform integration revealed that PCCs/PGLs are driven by diverse alterations affecting multiple genes and pathways. Pathogenic germline mutations occurred in eight PCC/PGL susceptibility genes. We identified CSDE1 as a somatically mutated driver gene, complementing four known drivers (HRAS, RET, EPAS1, and NF1). We also discovered fusion genes in PCCs/PGLs, involving MAML3, BRAF, NGFR, and NF1. Integrated analysis classified PCCs/PGLs into four molecularly defined groups: a kinase signaling subtype, a pseudohypoxia subtype, a Wnt-altered subtype, driven by MAML3 and CSDE1, and a cortical admixture subtype. Correlates of metastatic PCCs/PGLs included the MAML3 fusion gene. This integrated molecular characterization provides a comprehensive foundation for developing PCC/PGL precision medicine.
DOI: 10.1038/ng.2007.27
2007
Cited 529 times
Two independent alleles at 6q23 associated with risk of rheumatoid arthritis
To identify susceptibility alleles associated with rheumatoid arthritis, we genotyped 397 individuals with rheumatoid arthritis for 116,204 SNPs and carried out an association analysis in comparison to publicly available genotype data for 1,211 related individuals from the Framingham Heart Study. After evaluating and adjusting for technical and population biases, we identified a SNP at 6q23 (rs10499194, approximately 150 kb from TNFAIP3 and OLIG3) that was reproducibly associated with rheumatoid arthritis both in the genome-wide association (GWA) scan and in 5,541 additional case-control samples (P = 10(-3), GWA scan; P < 10(-6), replication; P = 10(-9), combined). In a concurrent study, the Wellcome Trust Case Control Consortium (WTCCC) has reported strong association of rheumatoid arthritis susceptibility to a different SNP located 3.8 kb from rs10499194 (rs6920220; P = 5 x 10(-6) in WTCCC). We show that these two SNP associations are statistically independent, are each reproducible in the comparison of our data and WTCCC data, and define risk and protective haplotypes for rheumatoid arthritis at 6q23.
DOI: 10.1158/2159-8290.cd-14-0623
2014
Cited 507 times
Somatic <i>ERCC2</i> Mutations Correlate with Cisplatin Sensitivity in Muscle-Invasive Urothelial Carcinoma
Abstract Cisplatin-based chemotherapy is the standard of care for patients with muscle-invasive urothelial carcinoma. Pathologic downstaging to pT0/pTis after neoadjuvant cisplatin-based chemotherapy is associated with improved survival, although molecular determinants of cisplatin response are incompletely understood. We performed whole-exome sequencing on pretreatment tumor and germline DNA from 50 patients with muscle-invasive urothelial carcinoma who received neoadjuvant cisplatin-based chemotherapy followed by cystectomy (25 pT0/pTis “responders,” 25 pT2+ “nonresponders”) to identify somatic mutations that occurred preferentially in responders. ERCC2, a nucleotide excision repair gene, was the only significantly mutated gene enriched in the cisplatin responders compared with nonresponders (q &amp;lt; 0.01). Expression of representative ERCC2 mutants in an ERCC2-deficient cell line failed to rescue cisplatin and UV sensitivity compared with wild-type ERCC2. The lack of normal ERCC2 function may contribute to cisplatin sensitivity in urothelial cancer, and somatic ERCC2 mutation status may inform cisplatin-containing regimen usage in muscle-invasive urothelial carcinoma. Significance: Somatic ERCC2 mutations correlate with complete response to cisplatin-based chemosensitivity in muscle-invasive urothelial carcinoma, and clinically identified mutations lead to cisplatin sensitivity in vitro. Nucleotide excision repair pathway defects may drive exceptional response to conventional chemotherapy. Cancer Discov; 4(10); 1140–53. ©2014 AACR. See related commentary by Turchi et al., p. 1118 This article is highlighted in the In This Issue feature, p. 1103
DOI: 10.1126/science.1258522
2015
Cited 503 times
Highly evolvable malaria vectors: The genomes of 16 <i>Anopheles</i> mosquitoes
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
DOI: 10.1038/nm.3559
2014
Cited 489 times
Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine
Whole-exome sequencing (WES) has emerged as a transformative technology for biological discovery, but technical difficulties have so far prevented its widespread clinical use. Here, Eliezer Van Allen and colleagues are able to perform production-scale WES on small amounts of clinically acquired formalin-fixed, paraffin-embedded tumor tissues. Using a newly created WES clinical interpretation algorithm, they apply the complete clinical WES framework prospectively to patients and demonstrate how it can be used to directly affect patient care. Translating whole-exome sequencing (WES) for prospective clinical use may have an impact on the care of patients with cancer; however, multiple innovations are necessary for clinical implementation. These include rapid and robust WES of DNA derived from formalin-fixed, paraffin-embedded tumor tissue, analytical output similar to data from frozen samples and clinical interpretation of WES data for prospective use. Here, we describe a prospective clinical WES platform for archival formalin-fixed, paraffin-embedded tumor samples. The platform employs computational methods for effective clinical analysis and interpretation of WES data. When applied retrospectively to 511 exomes, the interpretative framework revealed a 'long tail' of somatic alterations in clinically important genes. Prospective application of this approach identified clinically relevant alterations in 15 out of 16 patients. In one patient, previously undetected findings guided clinical trial enrollment, leading to an objective clinical response. Overall, this methodology may inform the widespread implementation of precision cancer medicine.
DOI: 10.15585/mmwr.mm7031e2
2021
Cited 483 times
Outbreak of SARS-CoV-2 Infections, Including COVID-19 Vaccine Breakthrough Infections, Associated with Large Public Gatherings — Barnstable County, Massachusetts, July 2021
During July 2021, 469 cases of COVID-19 associated with multiple summer events and large public gatherings in a town in Barnstable County, Massachusetts, were identified among Massachusetts residents; vaccination coverage among eligible Massachusetts residents was 69%. Approximately three quarters (346; 74%) of cases occurred in fully vaccinated persons (those who had completed a 2-dose course of mRNA vaccine [Pfizer-BioNTech or Moderna] or had received a single dose of Janssen [Johnson & Johnson] vaccine ≥14 days before exposure). Genomic sequencing of specimens from 133 patients identified the B.1.617.2 (Delta) variant of SARS-CoV-2, the virus that causes COVID-19, in 119 (89%) and the Delta AY.3 sublineage in one (1%). Overall, 274 (79%) vaccinated patients with breakthrough infection were symptomatic. Among five COVID-19 patients who were hospitalized, four were fully vaccinated; no deaths were reported. Real-time reverse transcription-polymerase chain reaction (RT-PCR) cycle threshold (Ct) values in specimens from 127 vaccinated persons with breakthrough cases were similar to those from 84 persons who were unvaccinated, not fully vaccinated, or whose vaccination status was unknown (median = 22.77 and 21.54, respectively). The Delta variant of SARS-CoV-2 is highly transmissible (1); vaccination is the most important strategy to prevent severe illness and death. On July 27, CDC recommended that all persons, including those who are fully vaccinated, should wear masks in indoor public settings in areas where COVID-19 transmission is high or substantial.* Findings from this investigation suggest that even jurisdictions without substantial or high COVID-19 transmission might consider expanding prevention strategies, including masking in indoor public settings regardless of vaccination status, given the potential risk of infection during attendance at large public gatherings that include travelers from many areas with differing levels of transmission.
DOI: 10.1158/2159-8290.cd-11-0184
2012
Cited 479 times
High-Throughput Detection of Actionable Genomic Alterations in Clinical Tumor Samples by Targeted, Massively Parallel Sequencing
Abstract Knowledge of “actionable” somatic genomic alterations present in each tumor (e.g., point mutations, small insertions/deletions, and copy-number alterations that direct therapeutic options) should facilitate individualized approaches to cancer treatment. However, clinical implementation of systematic genomic profiling has rarely been achieved beyond limited numbers of oncogene point mutations. To address this challenge, we utilized a targeted, massively parallel sequencing approach to detect tumor genomic alterations in formalin-fixed, paraffin-embedded (FFPE) tumor samples. Nearly 400-fold mean sequence coverage was achieved, and single-nucleotide sequence variants, small insertions/deletions, and chromosomal copynumber alterations were detected simultaneously with high accuracy compared with other methods in clinical use. Putatively actionable genomic alterations, including those that predict sensitivity or resistance to established and experimental therapies, were detected in each tumor sample tested. Thus, targeted deep sequencing of clinical tumor material may enable mutation-driven clinical trials and, ultimately, “personalized” cancer treatment. Significance: Despite the rapid proliferation of targeted therapeutic agents, systematic methods to profile clinically relevant tumor genomic alterations remain underdeveloped. We describe a sequencing-based approach to identifying genomic alterations in FFPE tumor samples. These studies affirm the feasibility and clinical utility of targeted sequencing in the oncology arena and provide a foundation for genomics-based stratification of cancer patients. Cancer Discovery; 2(1); 82–93. ©2011 AACR. Read the Commentary on this article by Corless and Spellman, p. 23 This article is highlighted in the In This Issue feature, p. 1
DOI: 10.1158/2159-8290.cd-13-0631
2014
Cited 435 times
MAP Kinase Pathway Alterations in <i>BRAF</i>-Mutant Melanoma Patients with Acquired Resistance to Combined RAF/MEK Inhibition
Abstract Treatment of BRAF-mutant melanoma with combined dabrafenib and trametinib, which target RAF and the downstream MAP–ERK kinase (MEK)1 and MEK2 kinases, respectively, improves progression-free survival and response rates compared with dabrafenib monotherapy. Mechanisms of clinical resistance to combined RAF/MEK inhibition are unknown. We performed whole-exome sequencing (WES) and whole-transcriptome sequencing (RNA-seq) on pretreatment and drug-resistant tumors from five patients with acquired resistance to dabrafenib/trametinib. In three of these patients, we identified additional mitogen-activated protein kinase (MAPK) pathway alterations in the resistant tumor that were not detected in the pretreatment tumor, including a novel activating mutation in MEK2 (MEK2Q60P). MEK2Q60P conferred resistance to combined RAF/MEK inhibition in vitro, but remained sensitive to inhibition of the downstream kinase extracellular signal–regulated kinase (ERK). The continued MAPK signaling-based resistance identified in these patients suggests that alternative dosing of current agents, more potent RAF/MEK inhibitors, and/or inhibition of the downstream kinase ERK may be needed for durable control of BRAF-mutant melanoma. Significance: This study represents an initial clinical genomic study of acquired resistance to combined RAF/MEK inhibition in BRAF-mutant melanoma, using WES and RNA-seq. The presence of diverse resistance mechanisms suggests that serial biopsies and genomic/molecular profiling at the time of resistance may ultimately improve the care of patients with resistant BRAF-mutant melanoma by specifying tailored targeted combinations to overcome specific resistance mechanisms. Cancer Discov; 4(1); 61–8. ©2013 AACR. See related commentary by Solit and Rosen, p. 27 This article is highlighted in the In This Issue feature, p. 1