ϟ

Eric S. Lander

Here are all the papers by Eric S. Lander that you can download and read on OA.mg.
Eric S. Lander’s last known institution is . Download Eric S. Lander PDFs here.

Claim this Profile →
DOI: 10.1073/pnas.0506580102
2005
Cited 39,052 times
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
DOI: 10.1038/35057062
2001
Cited 21,735 times
Initial sequencing and analysis of the human genome
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
DOI: 10.1038/nature15393
2015
Cited 13,993 times
A global reference for human genetic variation
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
DOI: 10.1038/nbt.1754
2011
Cited 11,601 times
Integrative genomics viewer
Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
DOI: 10.1126/science.286.5439.531
1999
Cited 10,930 times
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
DOI: 10.1038/ng1180
2003
Cited 8,289 times
PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
DOI: 10.1126/science.1181369
2009
Cited 7,342 times
Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome
We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
DOI: 10.1016/j.cell.2014.11.021
2014
Cited 6,587 times
A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping
We use in situ Hi-C to probe the 3D architecture of genomes, constructing haploid and diploid maps of nine cell types. The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1 kb resolution. We find that genomes are partitioned into contact domains (median length, 185 kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ∼10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind CTCF. CTCF sites at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs "facing" one another. The inactive X chromosome splits into two massive domains and contains large loops anchored at CTCF-binding repeats.
DOI: 10.1016/0888-7543(87)90010-3
1987
Cited 6,481 times
MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations
With the advent of RFLPs, genetic linkage maps are now being assembled for a number of organisms including both inbred experimental populations such as maize and outbred natural populations such as humans. Accurate construction of such genetic maps requires multipoint linkage analysis of particular types of pedigrees. We describe here a computer package, called MAPMAKER, designed specifically for this purpose. The program uses an efficient algorithm that allows simultaneous multipoint analysis of any number of loci. MAPMAKER also includes an interactive command language that makes it easy for a geneticist to explore linkage data. MAPMAKER has been applied to the construction of linkage maps in a number of organisms, including the human and several plants, and we outline the mapping strategies that have been used.
DOI: 10.1126/science.1069424
2002
Cited 5,406 times
The Structure of Haplotype Blocks in the Human Genome
Haplotype-based methods offer a powerful approach to disease gene mapping, based on the association between causal mutations and the ancestral haplotypes on which they arose. As part of The SNP Consortium Allele Frequency Projects, we characterized haplotype patterns across 51 autosomal regions (spanning 13 megabases of the human genome) in samples from Africa, Europe, and Asia. We show that the human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. The boundaries of blocks and specific haplotypes they contain are highly correlated across populations. We demonstrate that such haplotype frameworks provide substantial statistical power in association studies of common genetic variation across each region. Our results provide a foundation for the construction of a haplotype map of the human genome, facilitating comprehensive genetic association studies of human disease.
DOI: 10.1038/ng1195-241
1995
Cited 5,086 times
Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results
DOI: 10.1093/genetics/121.1.185
1989
Cited 4,878 times
Mapping mendelian factors underlying quantitative traits using RFLP linkage maps.
Abstract The advent of complete genetic linkage maps consisting of codominant DNA markers [typically restriction fragment length polymorphisms (RFLPs)] has stimulated interest in the systematic genetic dissection of discrete Mendelian factors underlying quantitative traits in experimental organisms. We describe here a set of analytical methods that modify and extend the classical theory for mapping such quantitative trait loci (QTLs). These include: (i) a method of identifying promising crosses for QTL mapping by exploiting a classical formula of SEWALL WRIGHT; (ii) a method (interval mapping) for exploiting the full power of RFLP linkage maps by adapting the approach of LOD score analysis used in human genetics, to obtain accurate estimates of the genetic location and phenotypic effect of QTLs; and (iii) a method (selective genotyping) that allows a substantial reduction in the number of progeny that need to be scored with the DNA markers. In addition to the exposition of the methods, explicit graphs are provided that allow experimental geneticists to estimate, in any particular case, the number of progeny required to map QTLs underlying a quantitative trait.
DOI: 10.1016/j.cell.2006.02.041
2006
Cited 4,855 times
A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells
The most highly conserved noncoding elements (HCNEs) in mammalian genomes cluster within regions enriched for genes encoding developmentally important transcription factors (TFs). This suggests that HCNE-rich regions may contain key regulatory controls involved in development. We explored this by examining histone methylation in mouse embryonic stem (ES) cells across 56 large HCNE-rich loci. We identified a specific modification pattern, termed "bivalent domains," consisting of large regions of H3 lysine 27 methylation harboring smaller regions of H3 lysine 4 methylation. Bivalent domains tend to coincide with TF genes expressed at low levels. We propose that bivalent domains silence developmental genes in ES cells while keeping them poised for activation. We also found striking correspondences between genome sequence and histone methylation in ES cells, which become notably weaker in differentiated cells. These results highlight the importance of DNA sequence in defining the initial epigenetic landscape and suggest a novel chromatin-based mechanism for maintaining pluripotency.
DOI: 10.1038/nature12213
2013
Cited 4,749 times
Mutational heterogeneity in cancer and the search for new cancer-associated genes
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
DOI: 10.1016/j.cell.2014.05.010
2014
Cited 4,607 times
Development and Applications of CRISPR-Cas9 for Genome Engineering
Recent advances in genome engineering technologies based on the CRISPR-associated RNA-guided endonuclease Cas9 are enabling the systematic interrogation of mammalian genome function. Analogous to the search function in modern word processors, Cas9 can be guided to specific locations within complex genomes by a short RNA search string. Using this system, DNA sequences within the endogenous genome and their functional outputs are now easily edited or modulated in virtually any organism of choice. Cas9-mediated genetic perturbation is simple and scalable, empowering researchers to elucidate the functional organization of the genome at the systems level and establish causal linkages between genetic variations and biological phenotypes. In this Review, we describe the development and applications of Cas9 for a variety of research or translational applications while highlighting challenges as well as future directions. Derived from a remarkable microbial defense system, Cas9 is driving innovative applications from basic biology to biotechnology and medicine.
DOI: 10.1126/science.1132939
2006
Cited 4,463 times
The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease
To pursue a systematic approach to the discovery of functional connections among diseases, genetic perturbation, and drug action, we have created the first installment of a reference collection of gene-expression profiles from cultured human cells treated with bioactive small molecules, together with pattern-matching software to mine these data. We demonstrate that this “Connectivity Map” resource can be used to find connections among small molecules sharing a mechanism of action, chemicals and physiological processes, and diseases and drugs. These results indicate the feasibility of the approach and suggest the value of a large-scale community Connectivity Map project.
DOI: 10.1016/j.cell.2013.09.034
2013
Cited 4,010 times
The Somatic Genomic Landscape of Glioblastoma
We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.
DOI: 10.1038/nbt.2514
2013
Cited 3,958 times
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples
Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
DOI: 10.1038/nature06008
2007
Cited 3,771 times
Genome-wide maps of chromatin state in pluripotent and lineage-committed cells
We report the application of single-molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells. By obtaining over four billion bases of sequence from chromatin immunoprecipitated DNA, we generated genome-wide chromatin-state maps of mouse embryonic stem cells, neural progenitor cells and embryonic fibroblasts. We find that lysine 4 and lysine 27 trimethylation effectively discriminates genes that are expressed, poised for expression, or stably repressed, and therefore reflect cell state and lineage potential. Lysine 36 trimethylation marks primary coding and non-coding transcripts, facilitating gene annotation. Trimethylation of lysine 9 and lysine 20 is detected at satellite, telomeric and active long-terminal repeats, and can spread into proximal unique sequences. Lysine 4 and lysine 9 trimethylation marks imprinting control regions. Finally, we show that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms. This study provides a framework for the application of comprehensive chromatin profiling towards characterization of diverse mammalian cell populations.
DOI: 10.1038/nature07672
2009
Cited 3,731 times
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified approximately 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFkappaB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.
DOI: 10.1126/science.1188021
2010
Cited 3,694 times
A Draft Sequence of the Neandertal Genome
Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.
DOI: 10.1038/nature08822
2010
Cited 3,347 times
The landscape of somatic copy-number alteration across human cancers
A powerful way to discover key genes with causal roles in oncogenesis is to identify genomic regions that undergo frequent alteration in human cancers. Here we present high-resolution analyses of somatic copy-number alterations (SCNAs) from 3,131 cancer specimens, belonging largely to 26 histological types. We identify 158 regions of focal SCNA that are altered at significant frequency across several cancer types, of which 122 cannot be explained by the presence of a known cancer target gene located within these regions. Several gene families are enriched among these regions of focal SCNA, including the BCL2 family of apoptosis regulators and the NF-κΒ pathway. We show that cancer cells containing amplifications surrounding the MCL1 and BCL2L1 anti-apoptotic genes depend on the expression of these genes for survival. Finally, we demonstrate that a large majority of SCNAs identified in individual cancer types are present in several cancer types. Two Articles in this issue add major data sets to the growing picture of the cancer genome. Bignell et al. analysed a large number of homozygous gene deletions in a collection of 746 publicly available cancer cell lines. Combined with information about hemizygous deletions of the same genes, the data suggest that many deletions found in cancer reflect the position of a gene at a fragile site in the genome, rather than as a recessive cancer gene whose loss confers a selective growth advantage. Beroukhim et al. present the largest data set to date on somatic copy-number variations across more than 3,000 specimens of human primary cancers. Many alterations are shared between multiple tumour types. Functional experiments demonstrate an oncogenic role for the apoptosis genes MCL1 and BCL2L1 that are associated with amplifications found in many cancers. One way of discovering genes with key roles in cancer development is to identify genomic regions that are frequently altered in human cancers. Here, high-resolution analyses of somatic copy-number alterations (SCNAs) in numerous cancer specimens provide an overview of regions of focal SCNA that are altered at significant frequency across several cancer types. An oncogenic function is also found for the anti-apoptosis genes MCL1 and BCL2L1, which reside in amplified genome regions in many cancers.
DOI: 10.1126/science.8091226
1994
Cited 3,017 times
Genetic Dissection of Complex Traits
Medical genetics was revolutionized during the 1980s by the application of genetic mapping to locate the genes responsible for simple Mendelian diseases. Most diseases and traits, however, do not follow simple inheritance patterns. Genetics have thus begun taking up the even greater challenge of the genetic dissection of complex traits. Four major approaches have been developed: linkage analysis, allele-sharing methods, association studies, and polygenic analysis of experimental crosses. This article synthesizes the current state of the genetic dissection of complex traits--describing the methods, limitations, and recent applications to biological problems.
DOI: 10.1073/pnas.96.6.2907
1999
Cited 2,840 times
Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation
Array technologies have made it straightforward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiation-for example, highlighting certain genes and pathways involved in "differentiation therapy" used in the treatment of acute promyelocytic leukemia.
DOI: 10.1038/35057149
2001
Cited 2,744 times
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
DOI: 10.1038/nature08460
2009
Cited 2,716 times
Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1
The proto-oncogene KRAS is mutated in a wide array of human cancers, most of which are aggressive and respond poorly to standard therapies. Although the identification of specific oncogenes has led to the development of clinically effective, molecularly targeted therapies in some cases, KRAS has remained refractory to this approach. A complementary strategy for targeting KRAS is to identify gene products that, when inhibited, result in cell death only in the presence of an oncogenic allele. Here we have used systematic RNA interference to detect synthetic lethal partners of oncogenic KRAS and found that the non-canonical IkappaB kinase TBK1 was selectively essential in cells that contain mutant KRAS. Suppression of TBK1 induced apoptosis specifically in human cancer cell lines that depend on oncogenic KRAS expression. In these cells, TBK1 activated NF-kappaB anti-apoptotic signals involving c-Rel and BCL-XL (also known as BCL2L1) that were essential for survival, providing mechanistic insights into this synthetic lethal interaction. These observations indicate that TBK1 and NF-kappaB signalling are essential in KRAS mutant tumours, and establish a general approach for the rational identification of co-dependent pathways in cancer.
DOI: 10.1056/nejmoa1409405
2014
Cited 2,697 times
Clonal Hematopoiesis and Blood-Cancer Risk Inferred from Blood DNA Sequence
Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent.We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling.Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow-biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones.Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.).
DOI: 10.1073/pnas.0904715106
2009
Cited 2,624 times
Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression
We recently showed that the mammalian genome encodes >1,000 large intergenic noncoding (linc)RNAs that are clearly conserved across mammals and, thus, functional. Gene expression patterns have implicated these lincRNAs in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. However, the mechanism by which these lincRNAs function is unknown. Here, we expand the catalog of human lincRNAs to ≈3,300 by analyzing chromatin-state maps of various human cell types. Inspired by the observation that the well-characterized lincRNA HOTAIR binds the polycomb repressive complex (PRC)2, we tested whether many lincRNAs are physically associated with PRC2. Remarkably, we observe that ≈20% of lincRNAs expressed in various cell types are bound by PRC2, and that additional lincRNAs are bound by other chromatin-modifying complexes. Also, we show that siRNA-mediated depletion of certain lincRNAs associated with PRC2 leads to changes in gene expression, and that the up-regulated genes are enriched for those normally silenced by PRC2. We propose a model in which some lincRNAs guide chromatin-modifying complexes to specific genomic loci to regulate gene expression.
DOI: 10.1038/nature12912
2014
Cited 2,612 times
Discovery and saturation analysis of cancer genes across 21 tumour types
Although a few cancer genes are mutated in a high proportion of tumours of a given type (>20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600–5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics. Large-scale genomic analysis of somatic point mutations in exomes from tumour–normal pairs across 21 cancer types identifies most known cancer genes in these tumour types as well as 33 genes not known to be significantly mutated, and down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. Most cancer genes are mutated at intermediate frequencies, appearing in less than one in five samples of a particular tumour type, so the accurate identification of cancer genes needs to be based on large-scale sampling in order to take account of this mutation-rate heterogeneity. This study presents a statistical analysis of 21 tumour types from more than 4,700 tumour–normal pairs. The authors identify 33 previously unknown genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Further analyses suggest that near-saturation may be achieved with between 600 and 5,000 samples for a given tumour type, depending on background mutation rate.
DOI: 10.1038/nature07423
2008
Cited 2,479 times
Somatic mutations affect key pathways in lung adenocarcinoma
Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers--including NF1, APC, RB1 and ATM--and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment.
DOI: 10.1126/science.1246981
2014
Cited 2,475 times
Genetic Screens in Human Cells Using the CRISPR-Cas9 System
Improving Whole-Genome Screens Improved methods are needed for the knockout of individual genes in genome-scale functional screens. Wang et al. (p. 80 , published online 12 December) and Shalem et al. (p. 84 , published online 12 December) used the bacterial CRISPR/Cas9 system to power-screen protocols that avoid several of the pitfalls associated with small interfering RNA (siRNA) screens. Genome editing by these methods completely disrupts target genes, thus avoiding weak signals that can occur when transcript abundance is partially decreased by siRNA. Furthermore, gene targeting by the CRISPR system is more precise and appears to produce substantially fewer off-target effects than existing methods.
DOI: 10.1126/science.1205438
2011
Cited 2,412 times
Detecting Novel Associations in Large Data Sets
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
DOI: 10.1073/pnas.191502998
2001
Cited 2,376 times
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
We have generated a molecular taxonomy of lung carcinoma, the leading cause of cancer death in the United States and worldwide. Using oligonucleotide microarrays, we analyzed mRNA expression levels corresponding to 12,600 transcript sequences in 186 lung tumor samples, including 139 adenocarcinomas resected from the lung. Hierarchical and probabilistic clustering of expression data defined distinct subclasses of lung adenocarcinoma. Among these were tumors with high relative expression of neuroendocrine genes and of type II pneumocyte genes, respectively. Retrospective analysis revealed a less favorable outcome for the adenocarcinomas with neuroendocrine gene expression. The diagnostic potential of expression profiling is emphasized by its ability to discriminate primary lung adenocarcinomas from metastases of extra-pulmonary origin. These results suggest that integration of expression profile data with clinical parameters could aid in diagnosis of lung cancer patients.
DOI: 10.1016/s1535-6108(02)00030-2
2002
Cited 2,323 times
Gene expression correlates of clinical prostate cancer behavior
Prostate tumors are among the most heterogeneous of cancers, both histologically and clinically. Microarray expression analysis was used to determine whether global biological differences underlie common pathological features of prostate cancer and to identify genes that might anticipate the clinical behavior of this disease. While no expression correlates of age, serum prostate specific antigen (PSA), and measures of local invasion were found, a set of genes was identified that strongly correlated with the state of tumor differentiation as measured by Gleason score. Moreover, a model using gene expression data alone accurately predicted patient outcome following prostatectomy. These results support the notion that the clinical behavior of prostate cancer is linked to underlying gene expression differences that are detectable at the time of diagnosis.
DOI: 10.1038/nm0102-68
2002
Cited 2,297 times
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning
DOI: 10.1016/j.cell.2012.06.024
2012
Cited 2,282 times
A Landscape of Driver Mutations in Melanoma
Despite recent insights into melanoma genetics, systematic surveys for driver mutations are challenged by an abundance of passenger mutations caused by carcinogenic UV light exposure. We developed a permutation-based framework to address this challenge, employing mutation data from intronic sequences to control for passenger mutational load on a per gene basis. Analysis of large-scale melanoma exome data by this approach discovered six novel melanoma genes (PPP6C, RAC1, SNX31, TACC1, STK19, and ARID2), three of which-RAC1, PPP6C, and STK19-harbored recurrent and potentially targetable mutations. Integration with chromosomal copy number data contextualized the landscape of driver mutations, providing oncogenic insights in BRAF- and NRAS-driven melanoma as well as those without known NRAS/BRAF mutations. The landscape also clarified a mutational basis for RB and p53 pathway deregulation in this malignancy. Finally, the spectrum of driver mutations provided unequivocal genomic evidence for a direct mutagenic role of UV light in melanoma pathogenesis.
DOI: 10.1038/nature07107
2008
Cited 2,275 times
Genome-scale DNA methylation maps of pluripotent and differentiated cells
DNA methylation is essential for normal development and has been implicated in many pathologies including cancer. Our knowledge about the genome-wide distribution of DNA methylation, how it changes during cellular differentiation and how it relates to histone methylation and other chromatin modifications in mammals remains limited. Here we report the generation and analysis of genome-scale DNA methylation profiles at nucleotide resolution in mammalian cells. Using high-throughput reduced representation bisulphite sequencing and single-molecule-based sequencing, we generated DNA methylation maps covering most CpG islands, and a representative sampling of conserved non-coding elements, transposons and other genomic features, for mouse embryonic stem cells, embryonic-stem-cell-derived and primary neural cells, and eight other primary tissues. Several key findings emerge from the data. First, DNA methylation patterns are better correlated with histone methylation patterns than with the underlying genome sequence context. Second, methylation of CpGs are dynamic epigenetic marks that undergo extensive changes during cellular differentiation, particularly in regulatory regions outside of core promoters. Third, analysis of embryonic-stem-cell-derived and primary cells reveals that 'weak' CpG islands associated with a specific set of developmentally regulated genes undergo aberrant hypermethylation during extended proliferation in vitro, in a pattern reminiscent of that reported in some primary tumours. More generally, the results establish reduced representation bisulphite sequencing as a powerful technology for epigenetic profiling of cell populations relevant to developmental biology, cancer and regenerative medicine.
DOI: 10.1038/ng1060
2002
Cited 2,261 times
A molecular signature of metastasis in primary solid tumors
DOI: 10.1038/nature04634
2006
Cited 2,242 times
Reactive oxygen species have a causal role in multiple forms of insulin resistance
DOI: 10.1038/415436a
2002
Cited 2,242 times
Prediction of central nervous system embryonal tumour outcome based on gene expression
DOI: 10.1126/science.1208130
2011
Cited 2,211 times
The Mutational Landscape of Head and Neck Squamous Cell Carcinoma
The mutational profile of head and neck cancer is complex and may pose challenges to the development of targeted therapies.
DOI: 10.1016/j.cell.2009.06.034
2009
Cited 2,175 times
Identification of Selective Inhibitors of Cancer Stem Cells by High-Throughput Screening
Screens for agents that specifically kill epithelial cancer stem cells (CSCs) have not been possible due to the rarity of these cells within tumor cell populations and their relative instability in culture. We describe here an approach to screening for agents with epithelial CSC-specific toxicity. We implemented this method in a chemical screen and discovered compounds showing selective toxicity for breast CSCs. One compound, salinomycin, reduces the proportion of CSCs by >100-fold relative to paclitaxel, a commonly used breast cancer chemotherapeutic drug. Treatment of mice with salinomycin inhibits mammary tumor growth in vivo and induces increased epithelial differentiation of tumor cells. In addition, global gene expression analyses show that salinomycin treatment results in the loss of expression of breast CSC genes previously identified by analyses of breast tissues isolated directly from patients. This study demonstrates the ability to identify agents with specific toxicity for epithelial CSCs.
DOI: 10.1016/j.cels.2016.07.002
2016
Cited 2,121 times
Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments
Hi-C experiments explore the 3D structure of the genome, generating terabases of data to create high-resolution contact maps. Here, we introduce Juicer, an open-source tool for analyzing terabase-scale Hi-C datasets. Juicer allows users without a computational background to transform raw sequence data into normalized contact maps with one click. Juicer produces a hic file containing compressed contact matrices at many resolutions, facilitating visualization and analysis at multiple scales. Structural features, such as loops and domains, are automatically annotated. Juicer is available as open source software at http://aidenlab.org/juicer/.
DOI: 10.1038/nature22991
2017
Cited 2,094 times
An immunogenic personal neoantigen vaccine for patients with melanoma
The results of a phase I trial assessing a personal neoantigen multi-peptide vaccine in patients with melanoma, showing feasibility, safety, and immunogenicity. Neoantigens have long been considered optimal targets for anti-tumour vaccines, and recent mutation coding and prediction techniques have aimed to streamline their identification and selection. Two papers in this issue report results from personalized neoantigen vaccine trials in patients with cancer. Catherine Wu and colleagues report the results of a phase I trial of a personalized cancer vaccine that targets up to 20 patient neoantigens. The vaccine was safe and induced tumour-antigen-specific immune responses. Four out of six patients treated showed no recurrence at 25 months, and progressing patients responded to further therapy with checkpoint inhibitor. Ugur Sahin and colleagues report the first-in-human application of a personalized neoantigen vaccine in patients with melanoma. Their vaccination strategy includes sequencing and computational identification of neoantigens from patients, and design and manufacture of a poly-antigen RNA vaccine for treatment. In 13 patients, the vaccine boosted immunity against some of the selected tumour antigens from the individual patients, and two patients showed infiltration of tumour-reactive T cells. These results suggest that personalized vaccines could be refined and tailored to provide clinical benefit as cancer immunotherapies. Effective anti-tumour immunity in humans has been associated with the presence of T cells directed at cancer neoantigens1, a class of HLA-bound peptides that arise from tumour-specific mutations. They are highly immunogenic because they are not present in normal tissues and hence bypass central thymic tolerance. Although neoantigens were long-envisioned as optimal targets for an anti-tumour immune response2, their systematic discovery and evaluation only became feasible with the recent availability of massively parallel sequencing for detection of all coding mutations within tumours, and of machine learning approaches to reliably predict those mutated peptides with high-affinity binding of autologous human leukocyte antigen (HLA) molecules. We hypothesized that vaccination with neoantigens can both expand pre-existing neoantigen-specific T-cell populations and induce a broader repertoire of new T-cell specificities in cancer patients, tipping the intra-tumoural balance in favour of enhanced tumour control. Here we demonstrate the feasibility, safety, and immunogenicity of a vaccine that targets up to 20 predicted personal tumour neoantigens. Vaccine-induced polyfunctional CD4+ and CD8+ T cells targeted 58 (60%) and 15 (16%) of the 97 unique neoantigens used across patients, respectively. These T cells discriminated mutated from wild-type antigens, and in some cases directly recognized autologous tumour. Of six vaccinated patients, four had no recurrence at 25 months after vaccination, while two with recurrent disease were subsequently treated with anti-PD-1 (anti-programmed cell death-1) therapy and experienced complete tumour regression, with expansion of the repertoire of neoantigen-specific T cells. These data provide a strong rationale for further development of this approach, alone and in combination with checkpoint blockade or other immunotherapies.
DOI: 10.1038/nature02800
2004
Cited 2,082 times
Transcriptional regulatory code of a eukaryotic genome
DNA-binding transcriptional regulators interpret the genome's regulatory code by binding to specific sequences to induce or repress gene expression1. Comparative genomics has recently been used to identify potential cis-regulatory sequences within the yeast genome on the basis of phylogenetic conservation2,3,4,5,6, but this information alone does not reveal if or when transcriptional regulators occupy these binding sites. We have constructed an initial map of yeast's transcriptional regulatory code by identifying the sequence elements that are bound by regulators under various conditions and that are conserved among Saccharomyces species. The organization of regulatory elements in promoters and the environment-dependent use of these elements by regulators are discussed. We find that environment-specific use of regulatory elements predicts mechanistic models for the function of a large population of yeast's transcriptional regulators.
DOI: 10.1126/science.280.5366.1077
1998
Cited 2,008 times
Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the Human Genome
Single-nucleotide polymorphisms (SNPs) are the most frequent type of variation in the human genome, and they provide powerful tools for a variety of medical genetic studies. In a large-scale survey for SNPs, 2.3 megabases of human genomic DNA was examined by a combination of gel-based sequencing and high-density variation-detection DNA chips. A total of 3241 candidate SNPs were identified. A genetic map was constructed showing the location of 2227 of these SNPs. Prototype genotyping chips were developed that allow simultaneous genotyping of 500 SNPs. The results provide a characterization of human diversity at the nucleotide level and demonstrate the feasibility of large-scale identification of human SNPs.
DOI: 10.1016/j.cell.2007.01.033
2007
Cited 1,927 times
The Mammalian Epigenome
Chemical modifications to DNA and histone proteins form a complex regulatory network that modulates chromatin structure and genome function. The epigenome refers to the complete description of these potentially heritable changes across the genome. The composition of the epigenome within a given cell is a function of genetic determinants, lineage, and environment. With the sequencing of the human genome completed, investigators now seek a comprehensive view of the epigenetic changes that determine how genetic information is made manifest across an incredibly varied background of developmental stages, tissue types, and disease states. Here we review current research efforts, with an emphasis on large-scale studies, emerging technologies, and challenges ahead.
DOI: 10.1038/s41588-018-0183-z
2018
Cited 1,923 times
Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.
DOI: 10.1038/nature01140
2002
Cited 1,884 times
Detecting recent positive selection in the human genome from haplotype structure
DOI: 10.1073/pnas.211566398
2001
Cited 1,882 times
Multiclass cancer diagnosis using tumor gene expression signatures
The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.
DOI: 10.1016/j.cell.2010.06.040
2010
Cited 1,873 times
A Large Intergenic Noncoding RNA Induced by p53 Mediates Global Gene Repression in the p53 Response
Recently, more than 1000 large intergenic noncoding RNAs (lincRNAs) have been reported. These RNAs are evolutionarily conserved in mammalian genomes and thus presumably function in diverse biological processes. Here, we report the identification of lincRNAs that are regulated by p53. One of these lincRNAs (lincRNA-p21) serves as a repressor in p53-dependent transcriptional responses. Inhibition of lincRNA-p21 affects the expression of hundreds of gene targets enriched for genes normally repressed by p53. The observed transcriptional repression by lincRNA-p21 is mediated through the physical association with hnRNP-K. This interaction is required for proper genomic localization of hnRNP-K at repressed genes and regulation of p53 mediates apoptosis. We propose a model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes.PaperFlickeyJraWQiOiI4ZjUxYWNhY2IzYjhiNjNlNzFlYmIzYWFmYTU5NmZmYyIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiIyYTZlYTRjYzlkNDM0NzE3NGUwNmJjYzViODA3MTRkNyIsImtpZCI6IjhmNTFhY2FjYjNiOGI2M2U3MWViYjNhYWZhNTk2ZmZjIiwiZXhwIjoxNjM0NzY2Nzk4fQ.IAAONGR20tXKAaz2IaYRYL1JBBz1lnoHXWWOlCCgHNXKeo5x7ielNbaBNqYMxeKblD-tvrsS3zfHkC1ZbQwLMAlNzfGHDeT8dr-HcKGwYPLq6Of9JAmFrhwf2VZhfwM0clOXPDwRS94YYJOOHFDcPRhTRSEGQnjXdGxmF-U9bqWyZZIqbC_hCNgr2vGAoqH8JjIFGJww21VNmYr316sREjH2zkrzH1vVC0cTIJwnPGkn-5zQvtyj6WRk_MtpX2RwA4fqMYnhzdsfByEu5nqxWdfQlMFYfUl6lCNFgoZYL0LyGaaTyY7e5xpfSUmW5fAI_k4-bGwJyY9A21OEEpE0vQ(mp4, (20.95 MB) Download video
DOI: 10.1038/ng1071
2003
Cited 1,843 times
Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease
DOI: 10.1038/nature06250
2007
Cited 1,827 times
Genome-wide detection and characterization of positive selection in human populations
With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.
DOI: 10.1038/nature03025
2004
Cited 1,824 times
Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype
Tetraodon nigroviridis is a freshwater puffer fish with the smallest known vertebrate genome. Here, we report a draft genome sequence with long-range linkage and substantial anchoring to the 21 Tetraodon chromosomes. Genome analysis provides a greatly improved fish gene catalogue, including identifying key genes previously thought to be absent in fish. Comparison with other vertebrates and a urochordate indicates that fish proteins have diverged markedly faster than their mammalian homologues. Comparison with the human genome suggests approximately 900 previously unannotated human genes. Analysis of the Tetraodon and human genomes shows that whole-genome duplication occurred in the teleost fish lineage, subsequent to its divergence from mammals. The analysis also makes it possible to infer the basic structure of the ancestral bony vertebrate genome, which was composed of 12 chromosomes, and to reconstruct much of the evolutionary history of ancient and recent chromosome rearrangements leading to the modern human karyotype.
DOI: 10.1038/nature03441
2005
Cited 1,798 times
Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals
Comprehensive identification of all functional elements encoded in the human genome is a fundamental need in biomedical research. Here, we present a comparative analysis of the human, mouse, rat and dog genomes to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs). The promoter analysis yields 174 candidate motifs, including most previously known transcription-factor binding sites and 105 new motifs. The 3'-UTR analysis yields 106 motifs likely to be involved in post-transcriptional regulation. Nearly one-half are associated with microRNAs (miRNAs), leading to the discovery of many new miRNA genes and their likely target genes. Our results suggest that previous estimates of the number of human miRNA genes were low, and that miRNAs regulate at least 20% of human genes. The overall results provide a systematic view of gene regulation in the human, which will be refined as additional mammalian genomes become available.
DOI: 10.1038/nature10398
2011
Cited 1,739 times
lincRNAs act in the circuitry controlling pluripotency and differentiation
Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.
DOI: 10.1038/nature01644
2003
Cited 1,715 times
Sequencing and comparison of yeast species to identify genes and regulatory elements
DOI: 10.1038/nbt.2203
2012
Cited 1,708 times
Absolute quantification of somatic DNA alterations in human cancer
Tumors vary in their ratio of normal to cancerous cells and in their genomic copy number. Carter et al. describe an analytic method for inferring the purity and ploidy of a tumor sample, enabling longitudinal studies of subclonal mutations and tumor evolution. We describe a computational method that infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. The method, named ABSOLUTE, can detect subclonal heterogeneity and somatic homozygosity, and it can calculate statistical sensitivity for detection of specific aberrations. We used ABSOLUTE to analyze exome sequencing data from 214 ovarian carcinoma tumor-normal pairs. This analysis identified both pervasive subclonal somatic point-mutations and a small subset of predominantly clonal and homozygous mutations, which were overrepresented in the tumor suppressor genes TP53 and NF1 and in a candidate tumor suppressor gene CDK12. We also used ABSOLUTE to infer absolute allelic copy-number profiles from 3,155 diverse cancer specimens, revealing that genome-doubling events are common in human cancer, likely occur in cells that are already aneuploid, and influence pathways of tumor progression (for example, with recessive inactivation of NF1 being less common after genome doubling). ABSOLUTE will facilitate the design of clinical sequencing studies and studies of cancer genome evolution and intra-tumor heterogeneity.
DOI: 10.1038/35075590
2001
Cited 1,695 times
Linkage disequilibrium in the human genome
1996
Cited 1,687 times
Parametric and nonparametric linkage analysis: a unified multipoint approach.
In complex disease studies, it is crucial to perform multipoint linkage analysis with many markers and to use robust nonparametric methods that take account of all pedigree information. Currently available methods fall short in both regards. In this paper, we describe how to extract complete multipoint inheritance information from general pedigrees of moderate size. This information is captured in the multipoint inheritance distribution, which provides a framework for a unified approach to both parametric and nonparametric methods of linkage analysis. Specifically, the approach includes the following: (1) Rapid exact computation of multipoint LOD scores involving dozens of highly polymorphic markers, even in the presence of loops and missing data. (2) Non-parametric linkage (NPL) analysis, a powerful new approach to pedigree analysis. We show that NPL is robust to uncertainty about mode of inheritance, is much more powerful than commonly used nonparametric methods, and loses little power relative to parametric linkage analysis. NPL thus appears to be the method of choice for pedigree studies of complex traits. (3) Information-content mapping, which measures the fraction of the total inheritance information extracted by the available marker data and points out the regions in which typing additional markers is most useful. (4) Maximum-likelihood reconstruction of many-marker haplotypes, even in pedigrees with missing data. We have implemented NPL analysis, LOD-score computation, information-content mapping, and haplotype reconstruction in a new computer package, GENEHUNTER. The package allows efficient multipoint analysis of pedigree data to be performed rapidly in a single user-friendly environment.
DOI: 10.1126/science.aaf5573
2016
Cited 1,679 times
C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector
The clustered regularly interspaced short palindromic repeat (CRISPR)-CRISPR-associated genes (Cas) adaptive immune system defends microbes against foreign genetic elements via DNA or RNA-DNA interference. We characterize the class 2 type VI CRISPR-Cas effector C2c2 and demonstrate its RNA-guided ribonuclease function. C2c2 from the bacterium Leptotrichia shahii provides interference against RNA phage. In vitro biochemical analysis shows that C2c2 is guided by a single CRISPR RNA and can be programmed to cleave single-stranded RNA targets carrying complementary protospacers. In bacteria, C2c2 can be programmed to knock down specific mRNAs. Cleavage is mediated by catalytic residues in the two conserved Higher Eukaryotes and Prokaryotes Nucleotide-binding (HEPN) domains, mutations of which generate catalytically inactive RNA-binding proteins. These results broaden our understanding of CRISPR-Cas systems and suggest that C2c2 can be used to develop new RNA-targeting tools.
DOI: 10.1038/ng1001-229
2001
Cited 1,671 times
High-resolution haplotype structure in the human genome
DOI: 10.1038/nbt1010-1045
2010
Cited 1,659 times
The NIH Roadmap Epigenomics Mapping Consortium
The NIH Roadmap Epigenomics Mapping Consortium aims to produce a public resource of epigenomic maps for stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.
DOI: 10.1038/nature10944
2012
Cited 1,652 times
The genomic basis of adaptive evolution in threespine sticklebacks
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine–freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine–freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature. A reference genome sequence for threespine sticklebacks, and re-sequencing of 20 additional world-wide populations, reveals loci used repeatedly during vertebrate evolution; multiple chromosome inversions contribute to marine-freshwater divergence, and regulatory variants predominate over coding variants in this classic example of adaptive evolution in natural environments. Threespine sticklebacks have become a powerful model for studying the molecular basis of adaptive evolution. This paper presents a high-quality reference genome sequence, along with genomes of 20 further individuals from a global set of marine and freshwater populations. Genomic analysis reveals that reuse of globally shared standing genetic variation plays an important part in repeated evolution of distinct stickleback populations, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. The data are consistent with an important role for regulatory changes during parallel evolution of marine and freshwater sticklebacks.
DOI: 10.1016/j.cell.2006.01.040
2006
Cited 1,634 times
A Lentiviral RNAi Library for Human and Mouse Genes Applied to an Arrayed Viral High-Content Screen
To enable arrayed or pooled loss-of-function screens in a wide range of mammalian cell types, including primary and nondividing cells, we are developing lentiviral short hairpin RNA (shRNA) libraries targeting the human and murine genomes. The libraries currently contain 104,000 vectors, targeting each of 22,000 human and mouse genes with multiple sequence-verified constructs. To test the utility of the library for arrayed screens, we developed a screen based on high-content imaging to identify genes required for mitotic progression in human cancer cells and applied it to an arrayed set of 5,000 unique shRNA-expressing lentiviruses that target 1,028 human genes. The screen identified several known and ∼100 candidate regulators of mitotic progression and proliferation; the availability of multiple shRNAs targeting the same gene facilitated functional validation of putative hits. This work provides a widely applicable resource for loss-of-function screens, as well as a roadmap for its application to biological discovery.
DOI: 10.1016/j.cell.2012.08.029
2012
Cited 1,613 times
Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
DOI: 10.7554/elife.27041
2017
Cited 1,611 times
The Human Cell Atlas
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
DOI: 10.1056/nejmoa073493
2007
Cited 1,590 times
Risk Alleles for Multiple Sclerosis Identified by a Genomewide Study
Multiple sclerosis has a clinically significant heritable component. We conducted a genomewide association study to identify alleles associated with the risk of multiple sclerosis.We used DNA microarray technology to identify common DNA sequence variants in 931 family trios (consisting of an affected child and both parents) and tested them for association. For replication, we genotyped another 609 family trios, 2322 case subjects, and 789 control subjects and used genotyping data from two external control data sets. A joint analysis of data from 12,360 subjects was performed to estimate the overall significance and effect size of associations between alleles and the risk of multiple sclerosis.A transmission disequilibrium test of 334,923 single-nucleotide polymorphisms (SNPs) in 931 family trios revealed 49 SNPs having an association with multiple sclerosis (P<1x10(-4)); of these SNPs, 38 were selected for the second-stage analysis. A comparison between the 931 case subjects from the family trios and 2431 control subjects identified an additional nonoverlapping 32 SNPs (P<0.001). An additional 40 SNPs with less stringent P values (<0.01) were also selected, for a total of 110 SNPs for the second-stage analysis. Of these SNPs, two within the interleukin-2 receptor alpha gene (IL2RA) were strongly associated with multiple sclerosis (P=2.96x10(-8)), as were a nonsynonymous SNP in the interleukin-7 receptor alpha gene (IL7RA) (P=2.94x10(-7)) and multiple SNPs in the HLA-DRA locus (P=8.94x10(-81)).Alleles of IL2RA and IL7RA and those in the HLA locus are identified as heritable risk factors for multiple sclerosis.
DOI: 10.1126/science.aal3327
2017
Cited 1,566 times
De novo assembly of the <i>Aedes aegypti</i> genome using Hi-C yields chromosome-length scaffolds
The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
DOI: 10.1016/s0092-8674(00)81641-4
1998
Cited 1,566 times
Dissecting the Regulatory Circuitry of a Eukaryotic Genome
Genome-wide expression analysis was used to identify genes whose expression depends on the functions of key components of the transcription initiation machinery in yeast. Components of the RNA polymerase II holoenzyme, the general transcription factor TFIID, and the SAGA chromatin modification complex were found to have roles in expression of distinct sets of genes. The results reveal an unanticipated level of regulation which is superimposed on that due to gene-specific transcription factors, a novel mechanism for coordinate regulation of specific sets of genes when cells encounter limiting nutrients, and evidence that the ultimate targets of signal transduction pathways can be identified within the initiation apparatus.
DOI: 10.1038/nature01554
2003
Cited 1,560 times
The genome sequence of the filamentous fungus Neurospora crassa
Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes--more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis of the gene set yields insights into unexpected aspects of Neurospora biology including the identification of genes potentially associated with red light photobiology, genes implicated in secondary metabolism, and important differences in Ca2+ signalling as compared with plants and animals. Neurospora possesses the widest array of genome defence mechanisms known for any eukaryotic organism, including a process unique to fungi called repeat-induced point mutation (RIP). Genome analysis suggests that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.
DOI: 10.1038/79216
2000
Cited 1,550 times
The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes
DOI: 10.1038/ng765
2001
Cited 1,546 times
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
DOI: 10.1073/pnas.1017351108
2010
Cited 1,491 times
High-quality draft assemblies of mammalian genomes from massively parallel sequence data
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
DOI: 10.1073/pnas.1518552112
2015
Cited 1,489 times
Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes
Significance When the human genome folds up inside the cell nucleus, it is spatially partitioned into numerous loops and contact domains. How these structures form is unknown. Here, we show that data from high-resolution spatial proximity maps are consistent with a model in which a complex, including the proteins CCCTC-binding factor (CTCF) and cohesin, mediates the formation of loops by a process of extrusion. Contact domains form as a byproduct of this process. The model accurately predicts how the genome will fold, using only information about the locations at which CTCF is bound. We demonstrate the ability to reengineer loops and domains in a predictable manner by creating highly targeted mutations, some as small as a single base pair, at CTCF sites.
DOI: 10.1016/j.cell.2017.09.026
2017
Cited 1,472 times
Cohesin Loss Eliminates All Loop Domains
The human genome folds to create thousands of intervals, called “contact domains,” that exhibit enhanced contact frequency within themselves. “Loop domains” form because of tethering between two loci—almost always bound by CTCF and cohesin—lying on the same chromosome. “Compartment domains” form when genomic intervals with similar histone marks co-segregate. Here, we explore the effects of degrading cohesin. All loop domains are eliminated, but neither compartment domains nor histone marks are affected. Loss of loop domains does not lead to widespread ectopic gene activation but does affect a significant minority of active genes. In particular, cohesin loss causes superenhancers to co-localize, forming hundreds of links within and across chromosomes and affecting the regulation of nearby genes. We then restore cohesin and monitor the re-formation of each loop. Although re-formation rates vary greatly, many megabase-sized loops recovered in under an hour, consistent with a model where loop extrusion is rapid.
DOI: 10.1038/nature24049
2017
Cited 1,463 times
RNA targeting with CRISPR–Cas13
RNA has important and diverse roles in biology, but molecular tools to manipulate and measure it are limited. For example, RNA interference can efficiently knockdown RNAs, but it is prone to off-target effects, and visualizing RNAs typically relies on the introduction of exogenous tags. Here we demonstrate that the class 2 type VI RNA-guided RNA-targeting CRISPR-Cas effector Cas13a (previously known as C2c2) can be engineered for mammalian cell RNA knockdown and binding. After initial screening of 15 orthologues, we identified Cas13a from Leptotrichia wadei (LwaCas13a) as the most effective in an interference assay in Escherichia coli. LwaCas13a can be heterologously expressed in mammalian and plant cells for targeted knockdown of either reporter or endogenous transcripts with comparable levels of knockdown as RNA interference and improved specificity. Catalytically inactive LwaCas13a maintains targeted RNA binding activity, which we leveraged for programmable tracking of transcripts in live cells. Our results establish CRISPR-Cas13a as a flexible platform for studying RNA in mammalian cells and therapeutic development.
DOI: 10.1038/nature02424
2004
Cited 1,462 times
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
DOI: 10.1126/science.aac7041
2015
Cited 1,449 times
Identification and characterization of essential genes in the human genome
Zeroing in on essential human genes More powerful genetic techniques are helping to define the list of genes required for the life of a human cell. Two papers used the CRISPR genome editing system and a gene trap method in haploid human cells to screen for essential genes (see the Perspective by Boone and Andrews). Wang et al. 's analysis of multiple cell lines indicates that it may be possible to find tumor-specific dependencies on particular genes. Blomen et al. investigate the phenomenon in which nonessential genes are required for fitness in the absence of another gene. Hence, complexity rather than robustness is the human strategy. Science , this issue p. 1096 and p. 1092 ; see also p. 1028
DOI: 10.1038/335721a0
1988
Cited 1,439 times
Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms
DOI: 10.1073/pnas.1119675109
2012
Cited 1,416 times
The mystery of missing heritability: Genetic interactions create phantom heritability
Human genetics has been haunted by the mystery of "missing heritability" of common traits. Although studies have discovered >1,200 variants associated with common diseases and traits, these variants typically appear to explain only a minority of the heritability. The proportion of heritability explained by a set of variants is the ratio of (i) the heritability due to these variants (numerator), estimated directly from their observed effects, to (ii) the total heritability (denominator), inferred indirectly from population data. The prevailing view has been that the explanation for missing heritability lies in the numerator--that is, in as-yet undiscovered variants. While many variants surely remain to be found, we show here that a substantial portion of missing heritability could arise from overestimation of the denominator, creating "phantom heritability." Specifically, (i) estimates of total heritability implicitly assume the trait involves no genetic interactions (epistasis) among loci; (ii) this assumption is not justified, because models with interactions are also consistent with observable data; and (iii) under such models, the total heritability may be much smaller and thus the proportion of heritability explained much larger. For example, 80% of the currently missing heritability for Crohn's disease could be due to genetic interactions, if the disease involves interaction among three pathways. In short, missing heritability need not directly correspond to missing variants, because current estimates of total heritability may be significantly inflated by genetic interactions. Finally, we describe a method for estimating heritability from isolated populations that is not inflated by genetic interactions.
DOI: 10.1016/j.cell.2005.01.001
2005
Cited 1,411 times
Genomic Maps and Comparative Analysis of Histone Modifications in Human and Mouse
We mapped histone H3 lysine 4 di- and trimethylation and lysine 9/14 acetylation across the nonrepetitive portions of human chromosomes 21 and 22 and compared patterns of lysine 4 dimethylation for several orthologous human and mouse loci. Both chromosomes show punctate sites enriched for modified histones. Sites showing trimethylation correlate with transcription starts, while those showing mainly dimethylation occur elsewhere in the vicinity of active genes. Punctate methylation patterns are also evident at the cytokine and IL-4 receptor loci. The Hox clusters present a strikingly different picture, with broad lysine 4-methylated regions that overlay multiple active genes. We suggest these regions represent active chromatin domains required for the maintenance of Hox gene expression. Methylation patterns at orthologous loci are strongly conserved between human and mouse even though many methylated sites do not show sequence conservation notably higher than background. This suggests that the DNA elements that direct the methylation represent only a small fraction of the region or lie at some distance from the site.
DOI: 10.1038/10290
1999
Cited 1,407 times
Characterization of single-nucleotide polymorphisms in coding regions of human genes
DOI: 10.1038/nature07056
2008
Cited 1,378 times
Dissecting direct reprogramming through integrative genomic analysis
Somatic cells can be reprogrammed to a pluripotent state through the ectopic expression of defined transcription factors. Understanding the mechanism and kinetics of this transformation may shed light on the nature of developmental potency and suggest strategies with improved efficiency or safety. Here we report an integrative genomic analysis of reprogramming of mouse fibroblasts and B lymphocytes. Lineage-committed cells show a complex response to the ectopic expression involving induction of genes downstream of individual reprogramming factors. Fully reprogrammed cells show gene expression and epigenetic states that are highly similar to embryonic stem cells. In contrast, stable partially reprogrammed cell lines show reactivation of a distinctive subset of stem-cell-related genes, incomplete repression of lineage-specifying transcription factors, and DNA hypermethylation at pluripotency-related loci. These observations suggest that some cells may become trapped in partially reprogrammed states owing to incomplete repression of transcription factors, and that DNA de-methylation is an inefficient step in the transition to pluripotency. We demonstrate that RNA inhibition of transcription factors can facilitate reprogramming, and that treatment with DNA methyltransferase inhibitors can improve the overall efficiency of the reprogramming process.
DOI: 10.1038/35020106
2000
Cited 1,376 times
Genomic analysis of metastasis reveals an essential role for RhoC
DOI: 10.1016/j.ccell.2017.07.007
2017
Cited 1,375 times
Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma
We performed integrated genomic, transcriptomic, and proteomic profiling of 150 pancreatic ductal adenocarcinoma (PDAC) specimens, including samples with characteristic low neoplastic cellularity. Deep whole-exome sequencing revealed recurrent somatic mutations in KRAS, TP53, CDKN2A, SMAD4, RNF43, ARID1A, TGFβR2, GNAS, RREB1, and PBRM1. KRAS wild-type tumors harbored alterations in other oncogenic drivers, including GNAS, BRAF, CTNNB1, and additional RAS pathway genes. A subset of tumors harbored multiple KRAS mutations, with some showing evidence of biallelic mutations. Protein profiling identified a favorable prognosis subset with low epithelial-mesenchymal transition and high MTOR pathway scores. Associations of non-coding RNAs with tumor-specific mRNA subtypes were also identified. Our integrated multi-platform analysis reveals a complex molecular landscape of PDAC and provides a roadmap for precision medicine.
DOI: 10.1073/pnas.84.8.2363
1987
Cited 1,366 times
Construction of multilocus genetic linkage maps in humans.
Human genetic linkage maps are most accurately constructed by using information from many loci simultaneously. Traditional methods for such multilocus linkage analysis are computationally prohibitive in general, even with supercomputers. The problem has acquired practical importance because of the current international collaboration aimed at constructing a complete human linkage map of DNA markers through the study of three-generation pedigrees. We describe here several alternative algorithms for constructing human linkage maps given a specified gene order. One method allows maximum-likelihood multilocus linkage maps for dozens of DNA markers in such three-generation pedigrees to be constructed in minutes.
DOI: 10.1016/j.cell.2011.07.026
2011
Cited 1,359 times
Stochastic State Transitions Give Rise to Phenotypic Equilibrium in Populations of Cancer Cells
Cancer cells within individual tumors often exist in distinct phenotypic states that differ in functional attributes. While cancer cell populations typically display distinctive equilibria in the proportion of cells in various states, the mechanisms by which this occurs are poorly understood. Here, we study the dynamics of phenotypic proportions in human breast cancer cell lines. We show that subpopulations of cells purified for a given phenotypic state return towards equilibrium proportions over time. These observations can be explained by a Markov model in which cells transition stochastically between states. A prediction of this model is that, given certain conditions, any subpopulation of cells will return to equilibrium phenotypic proportions over time. A second prediction is that breast cancer stem-like cells arise de novo from non-stem-like cells. These findings contribute to our understanding of cancer heterogeneity and reveal how stochasticity in single-cell behaviors promotes phenotypic equilibrium in populations of cancer cells.
DOI: 10.1126/science.1156409
2008
Cited 1,343 times
Genetic Mapping in Human Disease
Genetic mapping provides a powerful approach to identify genes and biological processes underlying any trait influenced by inheritance, including human diseases. We discuss the intellectual foundations of genetic mapping of Mendelian and complex traits in humans, examine lessons emerging from linkage analysis of Mendelian diseases and genome-wide association studies of common diseases, and discuss questions and challenges that lie ahead.
DOI: 10.1158/0008-5472.can-07-2938
2008
Cited 1,336 times
Loss of E-Cadherin Promotes Metastasis via Multiple Downstream Transcriptional Pathways
Loss of the epithelial adhesion molecule E-cadherin is thought to enable metastasis by disrupting intercellular contacts-an early step in metastatic dissemination. To further investigate the molecular basis of this notion, we use two methods to inhibit E-cadherin function that distinguish between E-cadherin's cell-cell adhesion and intracellular signaling functions. Whereas the disruption of cell-cell contacts alone does not enable metastasis, the loss of E-cadherin protein does, through induction of an epithelial-to-mesenchymal transition, invasiveness, and anoikis resistance. We find the E-cadherin binding partner beta-catenin to be necessary, but not sufficient, for induction of these phenotypes. In addition, gene expression analysis shows that E-cadherin loss results in the induction of multiple transcription factors, at least one of which, Twist, is necessary for E-cadherin loss-induced metastasis. These findings indicate that E-cadherin loss in tumors contributes to metastatic dissemination by inducing wide-ranging transcriptional and functional changes.
DOI: 10.1038/ng.2279
2012
Cited 1,327 times
Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer
Prostate cancer is the second most common cancer in men worldwide and causes over 250,000 deaths each year. Overtreatment of indolent disease also results in significant morbidity. Common genetic alterations in prostate cancer include losses of NKX3.1 (8p21) and PTEN (10q23), gains of AR (the androgen receptor gene) and fusion of ETS family transcription factor genes with androgen-responsive promoters. Recurrent somatic base-pair substitutions are believed to be less contributory in prostate tumorigenesis but have not been systematically analyzed in large cohorts. Here, we sequenced the exomes of 112 prostate tumor and normal tissue pairs. New recurrent mutations were identified in multiple genes, including MED12 and FOXA1. SPOP was the most frequently mutated gene, with mutations involving the SPOP substrate-binding cleft in 6-15% of tumors across multiple independent cohorts. Prostate cancers with mutant SPOP lacked ETS family gene rearrangements and showed a distinct pattern of genomic alterations. Thus, SPOP mutations may define a new molecular subtype of prostate cancer.
DOI: 10.1038/nature12975
2014
Cited 1,313 times
A polygenic burden of rare disruptive mutations in schizophrenia
Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.
DOI: 10.1038/nature09837
2011
Cited 1,309 times
Initial genome sequencing and analysis of multiple myeloma
Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signalling was indicated by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge.
DOI: 10.1091/mbc.12.2.323
2001
Cited 1,299 times
Remodeling of Yeast Genome Expression in Response to Environmental Changes
We used genome-wide expression analysis to explore how gene expression in Saccharomyces cerevisiae is remodeled in response to various changes in extracellular environment, including changes in temperature, oxidation, nutrients, pH, and osmolarity. The results demonstrate that more than half of the genome is involved in various responses to environmental change and identify the global set of genes induced and repressed by each condition. These data implicate a substantial number of previously uncharacterized genes in these responses and reveal a signature common to environmental responses that involves ∼10% of yeast genes. The results of expression analysis with MSN2/MSN4 mutants support the model that the Msn2/Msn4 activators induce the common response to environmental change. These results provide a global description of the transcriptional response to environmental change and extend our understanding of the role of activators in effecting this response.
DOI: 10.1038/nbt.1523
2009
Cited 1,278 times
Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
DOI: 10.1016/j.cels.2015.07.012
2016
Cited 1,237 times
Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom
Hi-C experiments study how genomes fold in 3D, generating contact maps containing features as small as 20 bp and as large as 200 Mb. Here we introduce Juicebox, a tool for exploring Hi-C and other contact map data. Juicebox allows users to zoom in and out of Hi-C maps interactively, just as a user of Google Earth might zoom in and out of a geographic map. Maps can be compared to one another, or to 1D tracks or 2D feature sets.
DOI: 10.1016/j.cell.2013.01.019
2013
Cited 1,231 times
Evolution and Impact of Subclonal Mutations in Chronic Lymphocytic Leukemia
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12, and del(13q)) or subclonal (e.g., SF3B1 and TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two time points. Ten of twelve CLL cases treated with chemotherapy (but only one of six without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1 and TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcomes.
DOI: 10.1016/j.cell.2016.11.038
2016
Cited 1,187 times
Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens
Genetic screens help infer gene function in mammalian cells, but it has remained difficult to assay complex phenotypes-such as transcriptional profiles-at scale. Here, we develop Perturb-seq, combining single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-based perturbations to perform many such assays in a pool. We demonstrate Perturb-seq by analyzing 200,000 cells in immune cells and cell lines, focusing on transcription factors regulating the response of dendritic cells to lipopolysaccharide (LPS). Perturb-seq accurately identifies individual gene targets, gene signatures, and cell states affected by individual perturbations and their genetic interactions. We posit new functions for regulators of differentiation, the anti-viral response, and mitochondrial function during immune activation. By decomposing many high content measurements into the effects of perturbations, their interactions, and diverse cell metadata, Perturb-seq dramatically increases the scope of pooled genomic assays.
DOI: 10.1038/nbt.1633
2010
Cited 1,186 times
Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5' start sites, 3' ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
DOI: 10.1016/j.cell.2013.03.002
2013
Cited 1,161 times
Lessons from the Cancer Genome
Systematic studies of the cancer genome have exploded in recent years. These studies have revealed scores of new cancer genes, including many in processes not previously known to be causal targets in cancer. The genes affect cell signaling, chromatin, and epigenomic regulation; RNA splicing; protein homeostasis; metabolism; and lineage maturation. Still, cancer genomics is in its infancy. Much work remains to complete the mutational catalog in primary tumors and across the natural history of cancer, to connect recurrent genomic alterations to altered pathways and acquired cellular vulnerabilities, and to use this information to guide the development and application of therapies.