ϟ

Marcin Imieliński

Here are all the papers by Marcin Imieliński that you can download and read on OA.mg.
Marcin Imieliński’s last known institution is . Download Marcin Imieliński PDFs here.

Claim this Profile →
DOI: 10.1038/nature12213
2013
Cited 4,749 times
Mutational heterogeneity in cancer and the search for new cancer-associated genes
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
DOI: 10.1016/j.cell.2012.06.024
2012
Cited 2,282 times
A Landscape of Driver Mutations in Melanoma
Despite recent insights into melanoma genetics, systematic surveys for driver mutations are challenged by an abundance of passenger mutations caused by carcinogenic UV light exposure. We developed a permutation-based framework to address this challenge, employing mutation data from intronic sequences to control for passenger mutational load on a per gene basis. Analysis of large-scale melanoma exome data by this approach discovered six novel melanoma genes (PPP6C, RAC1, SNX31, TACC1, STK19, and ARID2), three of which-RAC1, PPP6C, and STK19-harbored recurrent and potentially targetable mutations. Integration with chromosomal copy number data contextualized the landscape of driver mutations, providing oncogenic insights in BRAF- and NRAS-driven melanoma as well as those without known NRAS/BRAF mutations. The landscape also clarified a mutational basis for RB and p53 pathway deregulation in this malignancy. Finally, the spectrum of driver mutations provided unequivocal genomic evidence for a direct mutagenic role of UV light in melanoma pathogenesis.
DOI: 10.1038/s41586-019-1186-3
2019
Cited 2,191 times
Next-generation characterization of the Cancer Cell Line Encyclopedia
Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR–Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines. The original Cancer Cell Line Encyclopedia (CCLE) is expanded with deeper characterization of over 1,000 cell lines, including genomic, transcriptomic, and proteomic data, and integration with drug-sensitivity and gene-dependency data.
DOI: 10.1038/s41586-020-1969-6
2020
Cited 2,037 times
Pan-cancer analysis of whole genomes
Abstract Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale 1–3 . Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter 4 ; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation 5,6 ; analyses timings and patterns of tumour evolution 7 ; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity 8,9 ; and evaluates a range of more-specialized features of cancer genomes 8,10–18 .
DOI: 10.1016/j.cell.2012.08.029
2012
Cited 1,613 times
Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for more than 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole-genome sequence analysis revealed frequent structural rearrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
DOI: 10.1038/nature07953
2009
Cited 1,292 times
Autism genome-wide copy number variation reveals ubiquitin and neuronal genes
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with approximately 550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 x 10(-3)). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 x 10(-3)). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 x 10(-6)). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.
DOI: 10.1038/ng.764
2011
Cited 1,224 times
Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47
Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.
DOI: 10.1038/ng.3564
2016
Cited 927 times
Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas
To compare lung adenocarcinoma (ADC) and lung squamous cell carcinoma (SqCC) and to identify new drivers of lung carcinogenesis, we examined the exome sequences and copy number profiles of 660 lung ADC and 484 lung SqCC tumor-normal pairs. Recurrent alterations in lung SqCCs were more similar to those of other squamous carcinomas than to alterations in lung ADCs. New significantly mutated genes included PPP3CA, DOT1L, and FTSJD1 in lung ADC, RASA1 in lung SqCC, and KLF5, EP300, and CREBBP in both tumor types. New amplification peaks encompassed MIR21 in lung ADC, MIR205 in lung SqCC, and MAPK1 in both. Lung ADCs lacking receptor tyrosine kinase-Ras-Raf pathway alterations had mutations in SOS1, VAV1, RASA1, and ARHGAP35. Regarding neoantigens, 47% of the lung ADC and 53% of the lung SqCC tumors had at least five predicted neoepitopes. Although targeted therapies for lung ADC and SqCC are largely distinct, immunotherapies may aid in treatment for both subtypes.
DOI: 10.1038/nature07999
2009
Cited 913 times
Common genetic variants on 5p14.1 associate with autism spectrum disorders
Autism spectrum disorders (ASDs) represent a group of childhood neurodevelopmental and neuropsychiatric disorders characterized by deficits in verbal communication, impairment of social interaction, and restricted and repetitive patterns of interests and behaviour. To identify common genetic risk factors underlying ASDs, here we present the results of genome-wide association studies on a cohort of 780 families (3,101 subjects) with affected children, and a second cohort of 1,204 affected subjects and 6,491 control subjects, all of whom were of European ancestry. Six single nucleotide polymorphisms between cadherin 10 (CDH10) and cadherin 9 (CDH9)-two genes encoding neuronal cell-adhesion molecules-revealed strong association signals, with the most significant SNP being rs4307059 (P = 3.4 x 10(-8), odds ratio = 1.19). These signals were replicated in two independent cohorts, with combined P values ranging from 7.4 x 10(-8) to 2.1 x 10(-10). Our results implicate neuronal cell-adhesion molecules in the pathogenesis of ASDs, and represent, to our knowledge, the first demonstration of genome-wide significant association of common variants with susceptibility to ASDs.
DOI: 10.1126/science.aav1898
2018
Cited 822 times
The chromatin accessibility landscape of primary human cancers
We present the genome-wide chromatin accessibility profiles of 410 tumor samples spanning 23 cancer types from The Cancer Genome Atlas (TCGA). We identify 562,709 transposase-accessible DNA elements that substantially extend the compendium of known cis-regulatory elements. Integration of ATAC-seq (the assay for transposase-accessible chromatin using sequencing) with TCGA multi-omic data identifies a large number of putative distal enhancers that distinguish molecular subtypes of cancers, uncovers specific driving transcription factors via protein-DNA footprints, and nominates long-range gene-regulatory interactions in cancer. These data reveal genetic risk loci of cancer predisposition as active DNA regulatory elements in cancer, identify gene-regulatory interactions underlying cancer immune evasion, and pinpoint noncoding mutations that drive enhancer activation and may affect patient survival. These results suggest a systematic approach to understanding the noncoding genome in cancer to advance diagnosis and therapy.
DOI: 10.1038/s41586-019-1907-7
2020
Cited 712 times
The evolutionary history of 2,658 cancers
Abstract Cancer develops through a process of somatic evolution 1,2 . Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes 3 . Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) 4 , we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.
DOI: 10.1038/s41586-019-1913-9
2020
Cited 573 times
Patterns of somatic structural variation in human cancer genomes
Abstract A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes 1–7 . Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types 8 . Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions—as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2–7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and—in liver cancer—frequently activate the telomerase gene TERT . A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.
DOI: 10.1038/ng.489
2009
Cited 461 times
Common variants at five new loci associated with early-onset inflammatory bowel disease
The inflammatory bowel diseases (IBD) Crohn's disease and ulcerative colitis are common causes of morbidity in children and young adults in the western world. Here we report the results of a genome-wide association study in early-onset IBD involving 3,426 affected individuals and 11,963 genetically matched controls recruited through international collaborations in Europe and North America, thereby extending the results from a previous study of 1,011 individuals with early-onset IBD. We have identified five new regions associated with early-onset IBD susceptibility, including 16p11 near the cytokine gene IL27 (rs8049439, P = 2.41 x 10(-9)), 22q12 (rs2412973, P = 1.55 x 10(-9)), 10q22 (rs1250550, P = 5.63 x 10(-9)), 2q37 (rs4676410, P = 3.64 x 10(-8)) and 19q13.11 (rs10500264, P = 4.26 x 10(-10)). Our scan also detected associations at 23 of 32 loci previously implicated in adult-onset Crohn's disease and at 8 of 17 loci implicated in adult-onset ulcerative colitis, highlighting the close pathogenetic relationship between early- and adult-onset IBD.
DOI: 10.1038/s41588-019-0576-7
2020
Cited 451 times
Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing
Chromothripsis is a mutational phenomenon characterized by massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in selected cancer types have suggested that chromothripsis may be more common than initially inferred from low-resolution copy-number data. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we analyze patterns of chromothripsis across 2,658 tumors from 38 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of more than 50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy-number states, a considerable fraction of events involve multiple chromosomes and additional structural alterations. In addition to non-homologous end joining, we detect signatures of replication-associated processes and templated insertions. Chromothripsis contributes to oncogene amplification and to inactivation of genes such as mismatch-repair-related genes. These findings show that chromothripsis is a major process that drives genome evolution in human cancer.
DOI: 10.1038/s41586-020-1965-x
2020
Cited 432 times
Analyses of non-coding somatic drivers in 2,658 cancer whole genomes
Abstract The discovery of drivers of cancer has traditionally focused on protein-coding genes 1–4 . Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers 6,7 , raise doubts about others and identify novel candidates, including point mutations in the 5′ region of TP53 , in the 3′ untranslated regions of NFKBIZ and TOB1 , focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.
DOI: 10.1371/journal.pgen.1000536
2009
Cited 391 times
Genome-Wide Analyses of Exonic Copy Number Variants in a Family-Based Study Point to Novel Autism Susceptibility Genes
The genetics underlying the autism spectrum disorders (ASDs) is complex and remains poorly understood. Previous work has demonstrated an important role for structural variation in a subset of cases, but has lacked the resolution necessary to move beyond detection of large regions of potential interest to identification of individual genes. To pinpoint genes likely to contribute to ASD etiology, we performed high density genotyping in 912 multiplex families from the Autism Genetics Resource Exchange (AGRE) collection and contrasted results to those obtained for 1,488 healthy controls. Through prioritization of exonic deletions (eDels), exonic duplications (eDups), and whole gene duplication events (gDups), we identified more than 150 loci harboring rare variants in multiple unrelated probands, but no controls. Importantly, 27 of these were confirmed on examination of an independent replication cohort comprised of 859 cases and an additional 1,051 controls. Rare variants at known loci, including exonic deletions at NRXN1 and whole gene duplications encompassing UBE3A and several other genes in the 15q11–q13 region, were observed in the course of these analyses. Strong support was likewise observed for previously unreported genes such as BZRAP1, an adaptor molecule known to regulate synaptic transmission, with eDels or eDups observed in twelve unrelated cases but no controls (p = 2.3×10−5). Less is known about MDGA2, likewise observed to be case-specific (p = 1.3×10−4). But, it is notable that the encoded protein shows an unexpectedly high similarity to Contactin 4 (BLAST E-value = 3×10−39), which has also been linked to disease. That hundreds of distinct rare variants were each seen only once further highlights complexity in the ASDs and points to the continued need for larger cohorts.
DOI: 10.1038/ng.1013
2011
Cited 347 times
Genome-wide copy number variation study associates metabotropic glutamate receptor gene networks with attention deficit hyperactivity disorder
Hakon Hakonarson and colleagues report a genome-wide copy number variation study in 3,506 cases of attention-deficit hyperactivity disorder. The authors identify a statistically significant enrichment of CNVs impacting metabotropic glutamate receptor genes. Attention deficit hyperactivity disorder (ADHD) is a common, heritable neuropsychiatric disorder of unknown etiology. We performed a whole-genome copy number variation (CNV) study on 1,013 cases with ADHD and 4,105 healthy children of European ancestry using 550,000 SNPs. We evaluated statistically significant findings in multiple independent cohorts, with a total of 2,493 cases with ADHD and 9,222 controls of European ancestry, using matched platforms. CNVs affecting metabotropic glutamate receptor genes were enriched across all cohorts (P = 2.1 × 10−9). We saw GRM5 (encoding glutamate receptor, metabotropic 5) deletions in ten cases and one control (P = 1.36 × 10−6). We saw GRM7 deletions in six cases, and we saw GRM8 deletions in eight cases and no controls. GRM1 was duplicated in eight cases. We experimentally validated the observed variants using quantitative RT-PCR. A gene network analysis showed that genes interacting with the genes in the GRM family are enriched for CNVs in ∼10% of the cases (P = 4.38 × 10−10) after correction for occurrence in the controls. We identified rare recurrent CNVs affecting glutamatergic neurotransmission genes that were overrepresented in multiple ADHD cohorts.
DOI: 10.1101/gr.083501.108
2009
Cited 335 times
High-resolution mapping and analysis of copy number variations in the human genome: A data resource for clinical and research applications
We present a database of copy number variations (CNVs) detected in 2026 disease-free individuals, using high-density, SNP-based oligonucleotide microarrays. This large cohort, comprised mainly of Caucasians (65.2%) and African-Americans (34.2%), was analyzed for CNVs in a single study using a uniform array platform and computational process. We have catalogued and characterized 54,462 individual CNVs, 77.8% of which were identified in multiple unrelated individuals. These nonunique CNVs mapped to 3272 distinct regions of genomic variation spanning 5.9% of the genome; 51.5% of these were previously unreported, and >85% are rare. Our annotation and analysis confirmed and extended previously reported correlations between CNVs and several genomic features such as repetitive DNA elements, segmental duplications, and genes. We demonstrate the utility of this data set in distinguishing CNVs with pathologic significance from normal variants. Together, this analysis and annotation provides a useful resource to assist with the assessment of CNVs in the contexts of human variation, disease susceptibility, and clinical molecular diagnostics.
DOI: 10.1056/nejmoa0901867
2010
Cited 317 times
Variants of<i>DENND1B</i>Associated with Asthma in Children
Asthma is a complex disease that has genetic and environmental causes. The genetic factors associated with susceptibility to asthma remain largely unknown.We carried out a genomewide association study involving children with asthma. The sample included 793 North American children of European ancestry with persistent asthma who required daily inhaled glucocorticoid therapy and 1988 matched controls (the discovery set). We also tested for genomewide association in an independent cohort of 917 persons of European ancestry who had asthma and 1546 matched controls (the replication set). Finally, we tested for an association between 20 single-nucleotide polymorphisms (SNPs) at chromosome 1q31 and asthma in 1667 North American children of African ancestry who had asthma and 2045 ancestrally matched controls.In our meta-analysis of all samples from persons of European ancestry, we observed an association, with genomewide significance, between asthma and SNPs at the previously reported locus on 17q21 and an additional eight SNPs at a novel locus on 1q31. The SNP most strongly associated with asthma was rs2786098 (P=8.55x10(-9)). We observed replication of the association of asthma with SNP rs2786098 in the independent series of persons of European ancestry (combined P=9.3x10(-11)). The alternative allele of each of the eight SNPs on chromosome 1q31 was strongly associated with asthma in the children of African ancestry (P=1.6x10(-13) for the comparison across all samples). The 1q31 locus contains the 1q31 locus contains DENND1B, a gene expressed by natural killer cells and dendritic cells. DENND1B protein is predicted to interact with the tumor necrosis factor α receptor [corrected].We have identified a locus containing DENND1B on chromosome 1q31.3 that is associated with susceptibility to asthma.
DOI: 10.1371/journal.pgen.1002293
2011
Cited 312 times
A Genome-Wide Meta-Analysis of Six Type 1 Diabetes Cohorts Identifies Multiple Associated Loci
Diabetes impacts approximately 200 million people worldwide, of whom approximately 10% are affected by type 1 diabetes (T1D). The application of genome-wide association studies (GWAS) has robustly revealed dozens of genetic contributors to the pathogenesis of T1D, with the most recent meta-analysis identifying in excess of 40 loci. To identify additional genetic loci for T1D susceptibility, we examined associations in the largest meta-analysis to date between the disease and ∼2.54 million SNPs in a combined cohort of 9,934 cases and 16,956 controls. Targeted follow-up of 53 SNPs in 1,120 affected trios uncovered three new loci associated with T1D that reached genome-wide significance. The most significantly associated SNP (rs539514, P = 5.66×10⁻¹¹) resides in an intronic region of the LMO7 (LIM domain only 7) gene on 13q22. The second most significantly associated SNP (rs478222, P = 3.50×10⁻⁹ resides in an intronic region of the EFR3B (protein EFR3 homolog B) gene on 2p23; however, the region of linkage disequilibrium is approximately 800 kb and harbors additional multiple genes, including NCOA1, C2orf79, CENPO, ADCY3, DNAJC27, POMC, and DNMT3A. The third most significantly associated SNP (rs924043, P = 8.06×10⁻⁹ lies in an intergenic region on 6q27, where the region of association is approximately 900 kb and harbors multiple genes including WDR27, C6orf120, PHF10, TCTE3, C6orf208, LOC154449, DLL1, FAM120B, PSMB1, TBP, and PCD2. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D.
DOI: 10.1038/ng.203
2008
Cited 297 times
Loci on 20q13 and 21q22 are associated with pediatric-onset inflammatory bowel disease
Inflammatory bowel disease (IBD) is a common inflammatory disorder with complex etiology that involves both genetic and environmental triggers, including but not limited to defects in bacterial clearance, defective mucosal barrier and persistent dysregulation of the immune response to commensal intestinal bacteria. IBD is characterized by two distinct phenotypes: Crohn's disease (CD) and ulcerative colitis (UC). Previously reported GWA studies have identified genetic variation accounting for a small portion of the overall genetic susceptibility to CD and an even smaller contribution to UC pathogenesis. We hypothesized that stratification of IBD by age of onset might identify additional genes associated with IBD. To that end, we carried out a GWA analysis in a cohort of 1,011 individuals with pediatric-onset IBD and 4,250 matched controls. We identified and replicated significantly associated, previously unreported loci on chromosomes 20q13 (rs2315008[T] and rs4809330[A]; P = 6.30 x 10(-8) and 6.95 x 10(-8), respectively; odds ratio (OR) = 0.74 for both) and 21q22 (rs2836878[A]; P = 6.01 x 10(-8); OR = 0.73), located close to the TNFRSF6B and PSMG1 genes, respectively.
DOI: 10.1038/ng.3470
2015
Cited 296 times
Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers
Whole-genome analysis approaches are identifying recurrent cancer-associated somatic alterations in noncoding DNA regions. We combined somatic copy number analysis of 12 tumor types with tissue-specific epigenetic profiling to identify significant regions of focal amplification harboring super-enhancers. Copy number gains of noncoding regions harboring super-enhancers near KLF5, USP12, PARD6B and MYC are associated with overexpression of these cancer-related genes. We show that two distinct focal amplifications of super-enhancers 3' to MYC in lung adenocarcinoma (MYC-LASE) and endometrial carcinoma (MYC-ECSE) are physically associated with the MYC promoter and correlate with MYC overexpression. CRISPR/Cas9-mediated repression or deletion of a constituent enhancer within the MYC-LASE region led to significant reductions in the expression of MYC and its target genes and to the impairment of anchorage-independent and clonogenic growth, consistent with an oncogenic function. Our results suggest that genomic amplification of super-enhancers represents a common mechanism to activate cancer driver genes in multiple cancer types.
DOI: 10.1101/gr.221028.117
2018
Cited 296 times
SvABA: genome-wide detection of structural variants and indels by local assembly
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20–300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (&lt;1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50–300 bp) SVs.
DOI: 10.1016/j.cell.2021.03.009
2021
Cited 282 times
Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
DOI: 10.1038/s41588-019-0562-0
2020
Cited 277 times
Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition
About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.
DOI: 10.1038/s41588-019-0557-x
2020
Cited 258 times
Comprehensive molecular characterization of mitochondrial genomes in human cancers
Abstract Mitochondria are essential cellular organelles that play critical roles in cancer. Here, as part of the International Cancer Genome Consortium/The Cancer Genome Atlas Pan-Cancer Analysis of Whole Genomes Consortium, which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we performed a multidimensional, integrated characterization of mitochondrial genomes and related RNA sequencing data. Our analysis presents the most definitive mutational landscape of mitochondrial genomes and identifies several hypermutated cases. Truncating mutations are markedly enriched in kidney, colorectal and thyroid cancers, suggesting oncogenic effects with the activation of signaling pathways. We find frequent somatic nuclear transfers of mitochondrial DNA, some of which disrupt therapeutic target genes. Mitochondrial copy number varies greatly within and across cancers and correlates with clinical variables. Co-expression analysis highlights the function of mitochondrial genes in oxidative phosphorylation, DNA repair and the cell cycle, and shows their connections with clinically actionable genes. Our study lays a foundation for translating mitochondrial biology into clinical applications.
DOI: 10.1016/j.ajhg.2009.01.026
2009
Cited 252 times
Diverse Genome-wide Association Studies Associate the IL12/IL23 Pathway with Crohn Disease
Previous genome-wide association (GWA) studies typically focus on single-locus analysis, which may not have the power to detect the majority of genuinely associated loci. Here, we applied pathway analysis using Affymetrix SNP genotype data from the Wellcome Trust Case Control Consortium (WTCCC) and uncovered significant association between Crohn Disease (CD) and the IL12/IL23 pathway, harboring 20 genes (p = 8 x 10(-5)). Interestingly, the pathway contains multiple genes (IL12B and JAK2) or homologs of genes (STAT3 and CCR6) that were recently identified as genuine susceptibility genes only through meta-analysis of several GWA studies. In addition, the pathway contains other susceptibility genes for CD, including IL18R1, JUN, IL12RB1, and TYK2, which do not reach genome-wide significance by single-marker association tests. The observed pathway-specific association signal was subsequently replicated in three additional GWA studies of European and African American ancestry generated on the Illumina HumanHap550 platform. Our study suggests that examination beyond individual SNP hits, by focusing on genetic networks and pathways, is important to unleashing the true power of GWA studies.
DOI: 10.1016/j.ebiom.2017.12.026
2018
Cited 251 times
Deep Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images
Pathological evaluation of tumor tissue is pivotal for diagnosis in cancer patients and automated image analysis approaches have great potential to increase precision of diagnosis and help reduce human error.In this study, we utilize several computational methods based on convolutional neural networks (CNN) and build a stand-alone pipeline to effectively classify different histopathology images across different types of cancer.In particular, we demonstrate the utility of our pipeline to discriminate between two subtypes of lung cancer, four biomarkers of bladder cancer, and five biomarkers of breast cancer. In addition, we apply our pipeline to discriminate among four immunohistochemistry (IHC) staining scores of bladder and breast cancers.Our classification pipeline includes a basic CNN architecture, Google's Inceptions with three training strategies, and an ensemble of two state-of-the-art algorithms, Inception and ResNet. Training strategies include training the last layer of Google's Inceptions, training the network from scratch, and fine-tunning the parameters for our data using two pre-trained version of Google's Inception architectures, Inception-V1 and Inception-V3.We demonstrate the power of deep learning approaches for identifying cancer subtypes, and the robustness of Google's Inceptions even in presence of extensive tumor heterogeneity. On average, our pipeline achieved accuracies of 100%, 92%, 95%, and 69% for discrimination of various cancer tissues, subtypes, biomarkers, and scores, respectively. Our pipeline and related documentation is freely available at https://github.com/ih-_lab/CNN_Smoothie.
DOI: 10.1073/pnas.1203201109
2012
Cited 247 times
Functional analysis of receptor tyrosine kinase mutations in lung cancer identifies oncogenic extracellular domain mutations of <i>ERBB2</i>
We assessed somatic alleles of six receptor tyrosine kinase genes mutated in lung adenocarcinoma for oncogenic activity. Five of these genes failed to score in transformation assays; however, novel recurring extracellular domain mutations of the receptor tyrosine kinase gene ERBB2 were potently oncogenic. These ERBB2 extracellular domain mutants were activated by two distinct mechanisms, characterized by elevated C-terminal tail phosphorylation or by covalent dimerization mediated by intermolecular disulfide bond formation. These distinct mechanisms of receptor activation converged upon tyrosine phosphorylation of cellular proteins, impacting cell motility. Survival of Ba/F3 cells transformed to IL-3 independence by the ERBB2 extracellular domain mutants was abrogated by treatment with small-molecule inhibitors of ERBB2, raising the possibility that patients harboring such mutations could benefit from ERBB2-directed therapy.
DOI: 10.1016/j.ccell.2016.06.022
2016
Cited 178 times
High-throughput Phenotyping of Lung Cancer Somatic Mutations
Recent genome sequencing efforts have identified millions of somatic mutations in cancer. However, the functional impact of most variants is poorly understood. Here we characterize 194 somatic mutations identified in primary lung adenocarcinomas. We present an expression-based variant-impact phenotyping (eVIP) method that uses gene expression changes to distinguish impactful from neutral somatic mutations. eVIP identified 69% of mutations analyzed as impactful and 31% as functionally neutral. A subset of the impactful mutations induces xenograft tumor formation in mice and/or confers resistance to cellular EGFR inhibition. Among these impactful variants are rare somatic, clinically actionable variants including EGFR S645C, ARAF S214C and S214F, ERBB2 S418T, and multiple BRAF variants, demonstrating that rare mutations can be functionally important in cancer.
DOI: 10.1038/s41586-020-3017-y
2020
Cited 164 times
Histone H1 loss drives lymphoma by disrupting 3D chromatin architecture
Linker histone H1 proteins bind to nucleosomes and facilitate chromatin compaction1, although their biological functions are poorly understood. Mutations in the genes that encode H1 isoforms B-E (H1B, H1C, H1D and H1E; also known as H1-5, H1-2, H1-3 and H1-4, respectively) are highly recurrent in B cell lymphomas, but the pathogenic relevance of these mutations to cancer and the mechanisms that are involved are unknown. Here we show that lymphoma-associated H1 alleles are genetic driver mutations in lymphomas. Disruption of H1 function results in a profound architectural remodelling of the genome, which is characterized by large-scale yet focal shifts of chromatin from a compacted to a relaxed state. This decompaction drives distinct changes in epigenetic states, primarily owing to a gain of histone H3 dimethylation at lysine 36 (H3K36me2) and/or loss of repressive H3 trimethylation at lysine 27 (H3K27me3). These changes unlock the expression of stem cell genes that are normally silenced during early development. In mice, loss of H1c and H1e (also known as H1f2 and H1f4, respectively) conferred germinal centre B cells with enhanced fitness and self-renewal properties, ultimately leading to aggressive lymphomas with an increased repopulating potential. Collectively, our data indicate that H1 proteins are normally required to sequester early developmental genes into architecturally inaccessible genomic compartments. We also establish H1 as a bona fide tumour suppressor and show that mutations in H1 drive malignant transformation primarily through three-dimensional genome reorganization, which leads to epigenetic reprogramming and derepression of developmentally silenced genes.
DOI: 10.1038/s41467-018-04365-8
2018
Cited 157 times
IBD risk loci are enriched in multigenic regulatory modules encompassing putative causative genes
GWAS have identified >200 risk loci for Inflammatory Bowel Disease (IBD). The majority of disease associations are known to be driven by regulatory variants. To identify the putative causative genes that are perturbed by these variants, we generate a large transcriptome data set (nine disease-relevant cell types) and identify 23,650 cis-eQTL. We show that these are determined by ∼9720 regulatory modules, of which ∼3000 operate in multiple tissues and ∼970 on multiple genes. We identify regulatory modules that drive the disease association for 63 of the 200 risk loci, and show that these are enriched in multigenic modules. Based on these analyses, we resequence 45 of the corresponding 100 candidate genes in 6600 Crohn disease (CD) cases and 5500 controls, and show with burden tests that they include likely causative genes. Our analyses indicate that ≥10-fold larger sample sizes will be required to demonstrate the causality of individual genes using this approach.
DOI: 10.1038/s41467-019-13825-8
2020
Cited 148 times
A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.
DOI: 10.1038/s41467-021-21361-7
2021
Cited 148 times
Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions
In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling. We applied these methods to clinical specimens gathered from 669 patients in New York City during the first two months of the outbreak, yielding a broad molecular portrait of the emerging COVID-19 disease. We find significant enrichment of a NYC-distinctive clade of the virus (20C), as well as host responses in interferon, ACE, hematological, and olfaction pathways. In addition, we use 50,821 patient records to find that renin-angiotensin-aldosterone system inhibitors have a protective effect for severe COVID-19 outcomes, unlike similar drugs. Finally, spatial transcriptomic data from COVID-19 patient autopsy tissues reveal distinct ACE2 expression loci, with macrophage and neutrophil infiltration in the lungs. These findings can inform public health and may help develop and drive SARS-CoV-2 diagnostic, prevention, and treatment strategies.
DOI: 10.1016/j.cell.2020.08.006
2020
Cited 147 times
Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
DOI: 10.1038/s41587-022-01289-z
2022
Cited 52 times
Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing
DOI: 10.1073/pnas.1000274107
2010
Cited 207 times
Strong synaptic transmission impact by copy number variations in schizophrenia
Schizophrenia is a psychiatric disorder with onset in late adolescence and unclear etiology characterized by both positive and negative symptoms, as well as cognitive deficits. To identify copy number variations (CNVs) that increase the risk of schizophrenia, we performed a whole-genome CNV analysis on a cohort of 977 schizophrenia cases and 2,000 healthy adults of European ancestry who were genotyped with 1.7 million probes. Positive findings were evaluated in an independent cohort of 758 schizophrenia cases and 1,485 controls. The Gene Ontology synaptic transmission family of genes was notably enriched for CNVs in the cases (P = 1.5 x 10(-7)). Among these, CACNA1B and DOC2A, both calcium-signaling genes responsible for neuronal excitation, were deleted in 16 cases and duplicated in 10 cases, respectively. In addition, RET and RIT2, both ras-related genes important for neural crest development, were significantly affected by CNVs. RET deletion was exclusive to seven cases, and RIT2 deletions were overrepresented common variant CNVs in the schizophrenia cases. Our results suggest that novel variations involving the processes of synaptic transmission contribute to the genetic susceptibility of schizophrenia.
DOI: 10.1371/journal.pone.0001746
2008
Cited 191 times
Association Analysis of the FTO Gene with Obesity in Children of Caucasian and African Ancestry Reveals a Common Tagging SNP
Recently an association was demonstrated between the single nucleotide polymorphism (SNP), rs9939609, within the FTO locus and obesity as a consequence of a genome wide association (GWA) study of type 2 diabetes in adults. We examined the effects of two perfect surrogates for this SNP plus 11 other SNPs at this locus with respect to our childhood obesity cohort, consisting of both Caucasians and African Americans (AA). Utilizing data from our ongoing GWA study in our cohort of 418 Caucasian obese children (BMI>or=95th percentile), 2,270 Caucasian controls (BMI<95th percentile), 578 AA obese children and 1,424 AA controls, we investigated the association of the previously reported variation at the FTO locus with the childhood form of this disease in both ethnicities. The minor allele frequencies (MAF) of rs8050136 and rs3751812 (perfect surrogates for rs9939609 i.e. both r(2) = 1) in the Caucasian cases were 0.448 and 0.443 respectively while they were 0.391 and 0.386 in Caucasian controls respectively, yielding for both an odds ratio (OR) of 1.27 (95% CI 1.08-1.47; P = 0.0022). Furthermore, the MAFs of rs8050136 and rs3751812 in the AA cases were 0.449 and 0.115 respectively while they were 0.436 and 0.090 in AA controls respectively, yielding an OR of 1.05 (95% CI 0.91-1.21; P = 0.49) and of 1.31 (95% CI 1.050-1.643; P = 0.017) respectively. Investigating all 13 SNPs present on the Illumina HumanHap550 BeadChip in this region of linkage disequilibrium, rs3751812 was the only SNP conferring significant risk in AA. We have therefore replicated and refined the association in an AA cohort and distilled a tag-SNP, rs3751812, which captures the ancestral origin of the actual mutation. As such, variants in the FTO gene confer a similar magnitude of risk of obesity to children as to their adult counterparts and appear to have a global impact.
DOI: 10.1093/hmg/ddq078
2010
Cited 159 times
Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects
Inflammatory bowel disease, including Crohn's disease (CD) and ulcerative colitis (UC), and type 1 diabetes (T1D) are autoimmune diseases that may share common susceptibility pathways. We examined known susceptibility loci for these diseases in a cohort of 1689 CD cases, 777 UC cases, 989 T1D cases and 6197 shared control subjects of European ancestry, who were genotyped by the Illumina HumanHap550 SNP arrays. We identified multiple previously unreported or unconfirmed disease associations, including known CD loci (ICOSLG and TNFSF15) and T1D loci (TNFAIP3) that confer UC risk, known UC loci (HERC2 and IL26) that confer T1D risk and known UC loci (IL10 and CCNY) that confer CD risk. Additionally, we show that T1D risk alleles residing at the PTPN22, IL27, IL18RAP and IL10 loci protect against CD. Furthermore, the strongest risk alleles for T1D within the major histocompatibility complex (MHC) confer strong protection against CD and UC; however, given the multi-allelic nature of the MHC haplotypes, sequencing of the MHC locus will be required to interpret this observation. These results extend our current knowledge on genetic variants that predispose to autoimmunity, and suggest that many loci involved in autoimmunity may be under a balancing selection due to antagonistic pleiotropic effect. Our analysis implies that variants with opposite effects on different diseases may facilitate the maintenance of common susceptibility alleles in human populations, making autoimmune diseases especially amenable to genetic dissection by genome-wide association studies.
DOI: 10.1371/journal.pone.0087361
2014
Cited 154 times
A Pan-Cancer Analysis of Transcriptome Changes Associated with Somatic Mutations in U2AF1 Reveals Commonly Altered Splicing Events
Although recurrent somatic mutations in the splicing factor U2AF1 (also known as U2AF35) have been identified in multiple cancer types, the effects of these mutations on the cancer transcriptome have yet to be fully elucidated. Here, we identified splicing alterations associated with U2AF1 mutations across distinct cancers using DNA and RNA sequencing data from The Cancer Genome Atlas (TCGA). Using RNA-Seq data from 182 lung adenocarcinomas and 167 acute myeloid leukemias (AML), in which U2AF1 is somatically mutated in 3-4% of cases, we identified 131 and 369 splicing alterations, respectively, that were significantly associated with U2AF1 mutation. Of these, 30 splicing alterations were statistically significant in both lung adenocarcinoma and AML, including three genes in the Cancer Gene Census, CTNNB1, CHCHD7, and PICALM. Cell line experiments expressing U2AF1 S34F in HeLa cells and in 293T cells provide further support that these altered splicing events are caused by U2AF1 mutation. Consistent with the function of U2AF1 in 3' splice site recognition, we found that S34F/Y mutations cause preferences for CAG over UAG 3' splice site sequences. This report demonstrates consistent effects of U2AF1 mutation on splicing in distinct cancer cell types.
DOI: 10.2337/db08-1022
2009
Cited 147 times
Follow-Up Analysis of Genome-Wide Association Data Identifies Novel Loci for Type 1 Diabetes
OBJECTIVE—Two recent genome-wide association (GWA) studies have revealed novel loci for type 1 diabetes, a common multifactorial disease with a strong genetic component. To fully utilize the GWA data that we had obtained by genotyping 563 type 1 diabetes probands and 1,146 control subjects, as well as 483 case subject–parent trios, using the Illumina HumanHap550 BeadChip, we designed a full stage 2 study to capture other possible association signals. RESEARCH DESIGN AND METHODS—From our existing datasets, we selected 982 markers with P &amp;lt; 0.05 in both GWA cohorts. Genotyping these in an independent set of 636 nuclear families with 974 affected offspring revealed 75 markers that also had P &amp;lt; 0.05 in this third cohort. Among these, six single nucleotide polymorphisms in five novel loci also had P &amp;lt; 0.05 in the Wellcome Trust Case-Control Consortium dataset and were further tested in 1,303 type 1 diabetes probands from the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) plus 1,673 control subjects. RESULTS—Two markers (rs9976767 and rs3757247) remained significant after adjusting for the number of tests in this last cohort; they reside in UBASH3A (OR 1.16; combined P = 2.33 × 10−8) and BACH2 (1.13; combined P = 1.25 × 10−6). CONCLUSIONS—Evaluation of a large number of statistical GWA candidates in several independent cohorts has revealed additional loci that are associated with type 1 diabetes. The two genes at these respective loci, UBASH3A and BACH2, are both biologically relevant to autoimmunity.
DOI: 10.1016/j.cell.2016.12.025
2017
Cited 105 times
Insertions and Deletions Target Lineage-Defining Genes in Human Cancers
Certain cell types function as factories, secreting large quantities of one or more proteins that are central to the physiology of the respective organ. Examples include surfactant proteins in lung alveoli, albumin in liver parenchyma, and lipase in the stomach lining. Whole-genome sequencing analysis of lung adenocarcinomas revealed noncoding somatic mutational hotspots near VMP1/MIR21 and indel hotspots in surfactant protein genes (SFTPA1, SFTPB, and SFTPC). Extrapolation to other solid cancers demonstrated highly recurrent and tumor-type-specific indel hotspots targeting the noncoding regions of highly expressed genes defining certain secretory cellular lineages: albumin (ALB) in liver carcinoma, gastric lipase (LIPF) in stomach carcinoma, and thyroglobulin (TG) in thyroid carcinoma. The sequence contexts of indels targeting lineage-defining genes were significantly enriched in the AATAATD DNA motif and specific chromatin contexts, including H3K27ac and H3K36me3. Our findings illuminate a prevalent and hitherto unrecognized mutational process linking cellular lineage and cancer.
DOI: 10.1172/jci72763
2014
Cited 102 times
Oncogenic and sorafenib-sensitive ARAF mutations in lung adenocarcinoma
Targeted cancer therapies often induce "outlier" responses in molecularly defined patient subsets. One patient with advanced-stage lung adenocarcinoma, who was treated with oral sorafenib, demonstrated a near-complete clinical and radiographic remission for 5 years. Whole-genome sequencing and RNA sequencing of primary tumor and normal samples from this patient identified a somatic mutation, ARAF S214C, present in the cancer genome and expressed at high levels. Additional mutations affecting this residue of ARAF and a nearby residue in the related kinase RAF1 were demonstrated across 1% of an independent cohort of lung adenocarcinoma cases. The ARAF mutations were shown to transform immortalized human airway epithelial cells in a sorafenib-sensitive manner. These results suggest that mutant ARAF is an oncogenic driver in lung adenocarcinoma and an indicator of sorafenib response.
DOI: 10.1093/jamia/ocw148
2016
Cited 91 times
The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations
This paper describes the Precision Medicine Knowledge Base (PMKB; https://pmkb.weill.cornell.edu ), an interactive online application for collaborative editing, maintenance, and sharing of structured clinical-grade cancer mutation interpretations.PMKB was built using the Ruby on Rails Web application framework. Leveraging existing standards such as the Human Genome Variation Society variant description format, we implemented a data model that links variants to tumor-specific and tissue-specific interpretations. Key features of PMKB include support for all major variant types, standardized authentication, distinct user roles including high-level approvers, and detailed activity history. A REpresentational State Transfer (REST) application-programming interface (API) was implemented to query the PMKB programmatically.At the time of writing, PMKB contains 457 variant descriptions with 281 clinical-grade interpretations. The EGFR, BRAF, KRAS, and KIT genes are associated with the largest numbers of interpretable variants. PMKB's interpretations have been used in over 1500 AmpliSeq tests and 750 whole-exome sequencing tests. The interpretations are accessed either directly via the Web interface or programmatically via the existing API.An accurate and up-to-date knowledge base of genomic alterations of clinical significance is critical to the success of precision medicine programs. The open-access, programmatically accessible PMKB represents an important attempt at creating such a resource in the field of oncology.The PMKB was designed to help collect and maintain clinical-grade mutation interpretations and facilitate reporting for clinical cancer genomic testing. The PMKB was also designed to enable the creation of clinical cancer genomics automated reporting pipelines via an API.
DOI: 10.1038/s41467-019-13824-9
2020
Cited 90 times
Genomic footprints of activated telomere maintenance mechanisms in cancer
Abstract Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium , we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXX trunc ) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXX trunc tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.
DOI: 10.1101/2020.04.20.048066
2020
Cited 71 times
Shotgun Transcriptome and Isothermal Profiling of SARS-CoV-2 Infection Reveals Unique Host Responses, Viral Diversification, and Drug Interactions
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused thousands of deaths worldwide, including >18,000 in New York City (NYC) alone. The sudden emergence of this pandemic has highlighted a pressing clinical need for rapid, scalable diagnostics that can detect infection, interrogate strain evolution, and identify novel patient biomarkers. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs, plus a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, bacterial, and viral profiling. We applied both technologies across 857 SARS-CoV-2 clinical specimens and 86 NYC subway samples, providing a broad molecular portrait of the COVID-19 NYC outbreak. Our results define new features of SARS-CoV-2 evolution, nominate a novel, NYC-enriched viral subclade, reveal specific host responses in interferon, ACE, hematological, and olfaction pathways, and examine risks associated with use of ACE inhibitors and angiotensin receptor blockers. Together, these findings have immediate applications to SARS-CoV-2 diagnostics, public health, and new therapeutic targets.
DOI: 10.1001/jamasurg.2020.5601
2021
Cited 64 times
A Genomic-Pathologic Annotated Risk Model to Predict Recurrence in Early-Stage Lung Adenocarcinoma
<h3>Importance</h3> Recommendations for adjuvant therapy after surgical resection of lung adenocarcinoma (LUAD) are based solely on TNM classification but are agnostic to genomic and high-risk clinicopathologic factors. Creation of a prediction model that integrates tumor genomic and clinicopathologic factors may better identify patients at risk for recurrence. <h3>Objective</h3> To identify tumor genomic factors independently associated with recurrence, even in the presence of aggressive, high-risk clinicopathologic variables, in patients with completely resected stages I to III LUAD, and to develop a computational machine-learning prediction model (PRecur) to determine whether the integration of genomic and clinicopathologic features could better predict risk of recurrence, compared with the TNM system. <h3>Design, Setting, and Participants</h3> This prospective cohort study included 426 patients treated from January 1, 2008, to December 31, 2017, at a single large cancer center and selected in consecutive samples. Eligibility criteria included complete surgical resection of stages I to III LUAD, broad-panel next-generation sequencing data with matched clinicopathologic data, and no neoadjuvant therapy. External validation of the PRecur prediction model was performed using The Cancer Genome Atlas (TCGA). Data were analyzed from 2014 to 2018. <h3>Main Outcomes and Measures</h3> The study end point consisted of relapse-free survival (RFS), estimated using the Kaplan-Meier approach. Associations among clinicopathologic factors, genomic alterations, and RFS were established using Cox proportional hazards regression. The PRecur prediction model integrated genomic and clinicopathologic factors using gradient-boosting survival regression for risk group generation and prediction of RFS. A concordance probability estimate (CPE) was used to assess the predictive ability of the PRecur model. <h3>Results</h3> Of the 426 patients included in the analysis (286 women [67%]; median age at surgery, 69 [interquartile range, 62-75] years), 318 (75%) had stage I cancer. Association analysis showed that alterations in<i>SMARCA4</i>(clinicopathologic-adjusted hazard ratio [HR], 2.44; 95% CI, 1.03-5.77;<i>P</i> = .042) and<i>TP53</i>(clinicopathologic-adjusted HR, 1.73; 95% CI, 1.09-2.73;<i>P</i> = .02) and the fraction of genome altered (clinicopathologic-adjusted HR, 1.03; 95% CI, 1.10-1.04;<i>P</i> = .005) were independently associated with RFS. The PRecur prediction model outperformed the TNM-based model (CPE, 0.73 vs 0.61; difference, 0.12 [95% CI, 0.05-0.19];<i>P</i> &lt; .001) for prediction of RFS. To validate the prediction model, PRecur was applied to the TCGA LUAD data set (n = 360), and a clear separation of risk groups was noted (log-rank statistic, 7.5;<i>P</i> = .02), confirming external validation. <h3>Conclusions and Relevance</h3> The findings suggest that integration of tumor genomics and clinicopathologic features improves risk stratification and prediction of recurrence after surgical resection of early-stage LUAD. Improved identification of patients at risk for recurrence could enrich and enhance accrual to adjuvant therapy clinical trials.
DOI: 10.1016/j.molcel.2020.10.033
2020
Cited 57 times
Impact of Lineage Plasticity to and from a Neuroendocrine Phenotype on Progression and Response in Prostate and Lung Cancers
Intratumoral heterogeneity can occur via phenotype transitions, often after chronic exposure to targeted anticancer agents. This process, termed lineage plasticity, is associated with acquired independence to an initial oncogenic driver, resulting in treatment failure. In non-small cell lung cancer (NSCLC) and prostate cancers, lineage plasticity manifests when the adenocarcinoma phenotype transforms into neuroendocrine (NE) disease. The exact molecular mechanisms involved in this NE transdifferentiation remain elusive. In small cell lung cancer (SCLC), plasticity from NE to nonNE phenotypes is driven by NOTCH signaling. Herein we review current understanding of NE lineage plasticity dynamics, exemplified by prostate cancer, NSCLC, and SCLC.
DOI: 10.1073/pnas.2025182118
2021
Cited 54 times
Integrated mutational landscape analysis of uterine leiomyosarcomas
Significance Identification of novel, effective treatment modalities for patients with uterine leiomyosarcomas (uLMS) remains an unmet medical need. Using an integrated whole-genome, whole-exome, and RNA-Seq analysis, we identified recurrently mutated genes and deranged pathways, including the homologous-recombination repair (HRR) pathway deficiency (HRD), alternative lengthening of telomere (ALT), C-MYC/BET, and PI3K-AKT-mTOR pathways as potential targets. Using two fully sequenced patient-derived xenografts (PDXs) harboring deranged C-MYC/BET and PTEN/PIK3CA pathways and/or an HRD signature (i.e., LEY11 and LEY16), we found olaparib (PARPi), GS-626510 (BETi), and copanlisib (PIK3CAi) monotherapy to significantly inhibit in vivo uLMS PDXs growth. Our integrated genetic analysis, combined with in vivo preclinical validation experiments, suggests that a large subset of uLMS may potentially benefit from existing PARPi/BETi/PIK3CAi-targeted drugs.
DOI: 10.1158/2159-8290.cd-20-1334
2021
Cited 43 times
Discovery of Candidate DNA Methylation Cancer Driver Genes
Epigenetic alterations, such as promoter hypermethylation, may drive cancer through tumor suppressor gene inactivation. However, we have limited ability to differentiate driver DNA methylation (DNAme) changes from passenger events. We developed DNAme driver inference-MethSig-accounting for the varying stochastic hypermethylation rate across the genome and between samples. We applied MethSig to bisulfite sequencing data of chronic lymphocytic leukemia (CLL), multiple myeloma, ductal carcinoma in situ, glioblastoma, and to methylation array data across 18 tumor types in TCGA. MethSig resulted in well-calibrated quantile-quantile plots and reproducible inference of likely DNAme drivers with increased sensitivity/specificity compared with benchmarked methods. CRISPR/Cas9 knockout of selected candidate CLL DNAme drivers provided a fitness advantage with and without therapeutic intervention. Notably, DNAme driver risk score was closely associated with adverse outcome in independent CLL cohorts. Collectively, MethSig represents a novel inference framework for DNAme driver discovery to chart the role of aberrant DNAme in cancer. SIGNIFICANCE: MethSig provides a novel statistical framework for the analysis of DNA methylation changes in cancer, to specifically identify candidate DNA methylation driver genes of cancer progression and relapse, empowering the discovery of epigenetic mechanisms that enhance cancer cell fitness.This article is highlighted in the In This Issue feature, p. 2113.
DOI: 10.1038/s41586-022-05253-4
2022
Cited 28 times
Genomic signature of Fanconi anaemia DNA repair pathway deficiency in cancer
Fanconi anaemia (FA), a model syndrome of genome instability, is caused by a deficiency in DNA interstrand crosslink repair resulting in chromosome breakage1-3. The FA repair pathway protects against endogenous and exogenous carcinogenic aldehydes4-7. Individuals with FA are hundreds to thousands fold more likely to develop head and neck (HNSCC), oesophageal and anogenital squamous cell carcinomas8 (SCCs). Molecular studies of SCCs from individuals with FA (FA SCCs) are limited, and it is unclear how FA SCCs relate to sporadic HNSCCs primarily driven by tobacco and alcohol exposure or infection with human papillomavirus9 (HPV). Here, by sequencing genomes and exomes of FA SCCs, we demonstrate that the primary genomic signature of FA repair deficiency is the presence of high numbers of structural variants. Structural variants are enriched for small deletions, unbalanced translocations and fold-back inversions, and are often connected, thereby forming complex rearrangements. They arise in the context of TP53 loss, but not in the context of HPV infection, and lead to somatic copy-number alterations of HNSCC driver genes. We further show that FA pathway deficiency may lead to epithelial-to-mesenchymal transition and enhanced keratinocyte-intrinsic inflammatory signalling, which would contribute to the aggressive nature of FA SCCs. We propose that the genomic instability in sporadic HPV-negative HNSCC may arise as a result of the FA repair pathway being overwhelmed by DNA interstrand crosslink damage caused by alcohol and tobacco-derived aldehydes, making FA SCC a powerful model to study tumorigenesis resulting from DNA-crosslinking damage.
DOI: 10.1016/j.xcrm.2022.100522
2022
Cited 27 times
System-wide transcriptome damage and tissue identity loss in COVID-19 patients
The molecular mechanisms underlying the clinical manifestations of coronavirus disease 2019 (COVID-19), and what distinguishes them from common seasonal influenza virus and other lung injury states such as acute respiratory distress syndrome, remain poorly understood. To address these challenges, we combine transcriptional profiling of 646 clinical nasopharyngeal swabs and 39 patient autopsy tissues to define body-wide transcriptome changes in response to COVID-19. We then match these data with spatial protein and expression profiling across 357 tissue sections from 16 representative patient lung samples and identify tissue-compartment-specific damage wrought by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, evident as a function of varying viral loads during the clinical course of infection and tissue-type-specific expression states. Overall, our findings reveal a systemic disruption of canonical cellular and transcriptional pathways across all tissues, which can inform subsequent studies to combat the mortality of COVID-19 and to better understand the molecular dynamics of lethal SARS-CoV-2 and other respiratory infections.
DOI: 10.1158/2643-3230.bcd-22-0128
2023
Cited 12 times
Molecular Evolution of Classic Hodgkin Lymphoma Revealed Through Whole-Genome Sequencing of Hodgkin and Reed Sternberg Cells
Abstract The rarity of malignant Hodgkin and Reed Sternberg (HRS) cells in classic Hodgkin lymphoma (cHL) limits the ability to study the genomics of cHL. To circumvent this, our group has previously optimized fluorescence-activated cell sorting to purify HRS cells. Using this approach, we now report the whole-genome sequencing landscape of HRS cells and reconstruct the chronology and likely etiology of pathogenic events leading to cHL. We identified alterations in driver genes not previously described in cHL, APOBEC mutational activity, and the presence of complex structural variants including chromothripsis. We found that high ploidy in cHL is often acquired through multiple, independent chromosomal gains events including whole-genome duplication. Evolutionary timing analyses revealed that structural variants enriched for RAG motifs, driver mutations in B2M, BCL7A, GNA13, and PTPN1, and the onset of AID-driven mutagenesis usually preceded large chromosomal gains. This study provides a temporal reconstruction of cHL pathogenesis. Significance: Previous studies in cHL were limited to coding sequences and therefore not able to comprehensively decipher the tumor complexity. Here, leveraging cHL whole-genome characterization, we identify driver events and reconstruct the tumor evolution, finding that structural variants, driver mutations, and AID mutagenesis precede chromosomal gains. This article is highlighted in the In This Issue feature, p. 171
DOI: 10.1210/me.2002-0073
2002
Cited 154 times
Persistent Parity-Induced Changes in Growth Factors, TGF-β3, and Differentiation in the Rodent Mammary Gland
Epidemiological studies have repeatedly demonstrated that women who undergo an early first full-term pregnancy have a significantly reduced lifetime risk of breast cancer. Similarly, rodents that have previously undergone a full-term pregnancy are highly resistant to carcinogen-induced breast cancer compared with age-matched nulliparous controls. Little progress has been made, however, toward understanding the biological basis of this phenomenon. We have used DNA microarrays to identify a panel of 38 differentially expressed genes that reproducibly distinguishes, in a blinded manner, between the nulliparous and parous states of the mammary gland in multiple strains of mice and rats. We find that parity results in the persistent down-regulation of multiple genes encoding growth factors, such as amphiregulin, pleiotrophin, and IGF-1, as well as the persistent up-regulation of the growth-inhibitory molecule, TGF-beta3, and several of its transcriptional targets. Our studies further indicate that parity results in a persistent increase in the differentiated state of the mammary gland as well as lifelong changes in the hematopoietic cell types resident within the gland. These findings define a developmental state of the mammary gland that is refractory to carcinogenesis and suggest novel hypotheses for the mechanisms by which parity may modulate breast cancer risk.
DOI: 10.1016/j.jaci.2008.06.041
2008
Cited 89 times
ORMDL3 variants associated with asthma susceptibility in North Americans of European ancestry
To the Editor:Asthma is the most common chronic disease in children across all developed countries. Although the cause of the disease remains unknown, it is recognized as a complex genetic disorder with an environmental component.1Ober C. Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery.Genes Immun. 2006; 7: 95-100Crossref PubMed Scopus (519) Google Scholar, 2Newman-Taylor A. Environmental determinants of asthma.Lancet. 1995; 345: 296-299Crossref PubMed Scopus (95) Google Scholar As with many other complex diseases, a long list of genes has been associated with asthma through linkage and candidate gene association studies, the majority of which do not replicate.1Ober C. Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery.Genes Immun. 2006; 7: 95-100Crossref PubMed Scopus (519) Google Scholar The first genome-wide association study of asthma predisposition was recently published.3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar In that study 317,000 single nucleotide polymorphisms (SNPs) were typed in 994 patients with childhood-onset asthma, resulting in the identification of a novel locus on chromosome 17q12-q21 containing multiple genes and associated markers. Expression analysis in lymphoblastoid cell lines revealed that ORMDL3 expression was strongly correlated with the asthma-associated variants, leading the authors to conclude that it was the most likely candidate gene at this locus.ORMDL3 encodes a 4-transmembrane domain–containing protein that is localized to the endoplasmic reticulum membrane.4Hjelmqvist L. Tuson M. Marfany G. Herrero E. Balcells S. Gonzalez-Duarte R. ORMDL proteins are a conserved new family of endoplasmic reticulum membrane proteins.Genome Biol. 2002; 3 (RESEARCH0027)Crossref PubMed Google Scholar Although current knowledge of ORMDL3 function is limited, recent studies in yeast suggest the gene product might be involved in protein folding.To determine whether ORMDL3 is a genetic risk factor for the development of asthma in North American white subjects, we sought to replicate the association with the 10 most significantly associated SNPs in the study by Moffatt et al3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar in 2 large pediatric asthma cohorts, one comprising patients of Northern European decent and another comprising African American patients. Both cohorts were collected at the Children's Hospital of Philadelphia (CHOP).This study was approved by the Institutional Review Board at CHOP. Parental informed consent was obtained from all participants in this study for the purpose of DNA collection and genotyping.All patients and control subjects reported in this study were recruited at the CHOP between 2006 and 2008. All subjects were resident in the Greater Philadelphia area. The study of white subjects included 807 patients with physician-diagnosed asthma and 2583 disease-free control subjects without asthma. The study of African American subjects included 1456 patients with physician-diagnosed asthma and 1973 control subjects without asthma. Both white and African American patients were given diagnoses by CHOP physicians in accordance with the American Thoracic Society criteria5American Thoracic Society Guidelines for the evaluation of impairment/disability in patients with asthma.Am Rev Respir Dis. 1993; 147: 1056-1061Crossref PubMed Scopus (219) Google Scholar and had been prescribed medication to control their asthma. All control samples, both white and African American subjects, had no history of asthma or reactive airway disease and had never been prescribed asthma medications. In addition to self-reported ancestry status, all patients and control subjects were screened at ancestry informative markers using Markov Chain Monte Carlo algorithm, as implemented in STRUCTURE,6Falush D. Stephens M. Pritchard J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.Genetics. 2003; 164: 1567-1587PubMed Google Scholar to reduce the risk of population stratification. Genomic inflation of 1.08 in the white study and 1.1 for the African American study reflected minor background stratification. Mean age of the case cohort was as follows: white subjects, 8.6 years (σ 5.8; 62% male and 38% female); African American subjects, 7.5 years (σ 5.7; 57% male and 43% female). All control subjects were recruited by CHOP clinicians and nursing staff within the CHOP Health Care Network, including 4 primary care clinics and several group practices and outpatient practices that included well child visits. Mean age in the control cohort was as follows: white subjects, 8.7 years (σ 5.2; 51% male and 49% female); African American subjects, 6.7 years (σ 7.8; 49% male and 51% female).High-throughput genome-wide SNP genotyping was carried out at the center for applied genomics on the Illumina Infinium II HumanHap550 BeadChip (San Diego, Calif), as previously described.7Hakonarson H. Grant S.F. Bradfield J.P. Marchand L. Kim C.E. Glessner J.T. et al.A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene.Nature. 2007; 448: 591-594Crossref PubMed Scopus (425) Google ScholarOf the 10 most highly associated SNPs in the Moffatt et al study,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar 9 were present on the Illumina HapMap550 BeadChip. The missing SNP, rs4795408, was therefore not included in this study. The remaining 9 SNPs had a call rate of greater than 99% and were in Hardy-Weinberg equilibrium (P > 10−5). The results of the allelic association and odds ratios are summarized in Table I.Table IAllelic association and odds ratios for the 9 most significantly associated SNPs from the Moffatt et al study3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google ScholarSNPBase pairMinor alleleReference alleleF_AF_UCHISQP valueORrs930327735229995TC0.53860.49797.371.0066281.176rs1155746735282160TG0.55320.51137.833.005131.183rs806737835304874GA0.54380.50128.103.0044181.186rs229040035319766GA0.54660.50527.616.0057861.179rs721638935323475CT0.54910.50897.214.0072341.174rs479540535341943TC0.59690.56075.952.01471.162rs807941635346239TC0.51390.5454.33.037440.883rs389419435375519CT0.52660.552.45.11750.91rs385919235382174CT0.53590.55351.398.2370.928F_A, Allele frequency in patients; F_U, allele frequency in control subjects; OR, odds ratio. Open table in a new tab We detected significant association at 7 of the 9 SNPs tested (Table I). The most strongly associated marker was SNP rs8067378, which was also found to be the most significantly associated in the Moffatt MRC-A familial cohort. The SNP most highly correlated with ORMDL3 expression that was also the most significantly associated in the Moffatt et al study,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar rs7216389, was significant in our cohort, with an odds ratio of 1.17 compared with a reported odds ratio of 1.84. The odds ratios for the significantly associated SNPs in our cohort ranged between 1.13 and 1.18, placing them within a similar range as the German cohort examined by Moffatt et al (International Study of Asthma and Allergies in Childhood [ISAAC] II replication study lower 95% CI range, 0.89-1.2). Additionally, the 2 SNPs in the interval, rs3894194 and rs3859192, that were not significantly associated with asthma in our cohort were also not significant in the ISAAC II cohort. We additionally examined for association of these 7 SNPs to asthma in an independent case-control cohort of subjects of African American decent. We detected no evidence for association between these markers and asthma in the 1456 patients and 1973 control subjects examined. None of the SNPs had a P value of less than .27 (range: P = .27-.96; odds ratio, 0.93-1.1).This study has replicated the reported association between asthma and variants in and around the ORMDL3 gene in a cohort of North American white asthmatic subjects. Seven of the 9 SNPs that were tested showed significant association. The remaining 2 SNPs, rs3894194 and rs3859192, might have been in higher linkage disequilibrium with a causal SNP in the British cohort studied by Moffatt et al3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar because the German ISAAC II cohort that was reported in the same study shows the same pattern of association as our data, strongly suggesting that the SNPs are not associated with asthma predisposition. Furthermore, the odds ratios obtained in our cohort are significantly lower than those reported by Moffatt et al, which might be a case of the winner's curse, in which the strength of association is overestimated in initial reports. In contrast, no association was detected between these markers and asthma in African American subjects. Our results are in agreement with another recent study of African American subjects8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar that did not detect association at rs7216389. Galanter et al,8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar did, however, type additional SNPs within ORMDL3, identifying association at 2 variants. We cannot therefore exclude the possibility that the white subject–associated SNPs do not tag causal variants that might be present in African American subjects.In addition to our replication in North American white subjects and the British and German cohorts reported by Moffatt et al,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google ScholarORMDL3 has also been replicated in 2 other studies. The first was a study that looked at Mexican, Puerto Rican, and African American subjects,8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar and the second was a study that looked at a Scottish cohort.9Tavendale R. Macgregor D.F. Mukhopadhyay S. Palmer C.N. A polymorphism controlling ORMDL3 expression is associated with asthma that is poorly controlled by current medications.J Allergy Clin Immunol. 2008; 121: 860-863Abstract Full Text Full Text PDF PubMed Scopus (131) Google Scholar The weight of evidence across 6 different populations therefore supports the association between variants at the ORMDL3 locus and asthma. To the Editor: Asthma is the most common chronic disease in children across all developed countries. Although the cause of the disease remains unknown, it is recognized as a complex genetic disorder with an environmental component.1Ober C. Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery.Genes Immun. 2006; 7: 95-100Crossref PubMed Scopus (519) Google Scholar, 2Newman-Taylor A. Environmental determinants of asthma.Lancet. 1995; 345: 296-299Crossref PubMed Scopus (95) Google Scholar As with many other complex diseases, a long list of genes has been associated with asthma through linkage and candidate gene association studies, the majority of which do not replicate.1Ober C. Hoffjan S. Asthma genetics 2006: the long and winding road to gene discovery.Genes Immun. 2006; 7: 95-100Crossref PubMed Scopus (519) Google Scholar The first genome-wide association study of asthma predisposition was recently published.3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar In that study 317,000 single nucleotide polymorphisms (SNPs) were typed in 994 patients with childhood-onset asthma, resulting in the identification of a novel locus on chromosome 17q12-q21 containing multiple genes and associated markers. Expression analysis in lymphoblastoid cell lines revealed that ORMDL3 expression was strongly correlated with the asthma-associated variants, leading the authors to conclude that it was the most likely candidate gene at this locus. ORMDL3 encodes a 4-transmembrane domain–containing protein that is localized to the endoplasmic reticulum membrane.4Hjelmqvist L. Tuson M. Marfany G. Herrero E. Balcells S. Gonzalez-Duarte R. ORMDL proteins are a conserved new family of endoplasmic reticulum membrane proteins.Genome Biol. 2002; 3 (RESEARCH0027)Crossref PubMed Google Scholar Although current knowledge of ORMDL3 function is limited, recent studies in yeast suggest the gene product might be involved in protein folding. To determine whether ORMDL3 is a genetic risk factor for the development of asthma in North American white subjects, we sought to replicate the association with the 10 most significantly associated SNPs in the study by Moffatt et al3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar in 2 large pediatric asthma cohorts, one comprising patients of Northern European decent and another comprising African American patients. Both cohorts were collected at the Children's Hospital of Philadelphia (CHOP). This study was approved by the Institutional Review Board at CHOP. Parental informed consent was obtained from all participants in this study for the purpose of DNA collection and genotyping. All patients and control subjects reported in this study were recruited at the CHOP between 2006 and 2008. All subjects were resident in the Greater Philadelphia area. The study of white subjects included 807 patients with physician-diagnosed asthma and 2583 disease-free control subjects without asthma. The study of African American subjects included 1456 patients with physician-diagnosed asthma and 1973 control subjects without asthma. Both white and African American patients were given diagnoses by CHOP physicians in accordance with the American Thoracic Society criteria5American Thoracic Society Guidelines for the evaluation of impairment/disability in patients with asthma.Am Rev Respir Dis. 1993; 147: 1056-1061Crossref PubMed Scopus (219) Google Scholar and had been prescribed medication to control their asthma. All control samples, both white and African American subjects, had no history of asthma or reactive airway disease and had never been prescribed asthma medications. In addition to self-reported ancestry status, all patients and control subjects were screened at ancestry informative markers using Markov Chain Monte Carlo algorithm, as implemented in STRUCTURE,6Falush D. Stephens M. Pritchard J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.Genetics. 2003; 164: 1567-1587PubMed Google Scholar to reduce the risk of population stratification. Genomic inflation of 1.08 in the white study and 1.1 for the African American study reflected minor background stratification. Mean age of the case cohort was as follows: white subjects, 8.6 years (σ 5.8; 62% male and 38% female); African American subjects, 7.5 years (σ 5.7; 57% male and 43% female). All control subjects were recruited by CHOP clinicians and nursing staff within the CHOP Health Care Network, including 4 primary care clinics and several group practices and outpatient practices that included well child visits. Mean age in the control cohort was as follows: white subjects, 8.7 years (σ 5.2; 51% male and 49% female); African American subjects, 6.7 years (σ 7.8; 49% male and 51% female). High-throughput genome-wide SNP genotyping was carried out at the center for applied genomics on the Illumina Infinium II HumanHap550 BeadChip (San Diego, Calif), as previously described.7Hakonarson H. Grant S.F. Bradfield J.P. Marchand L. Kim C.E. Glessner J.T. et al.A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene.Nature. 2007; 448: 591-594Crossref PubMed Scopus (425) Google Scholar Of the 10 most highly associated SNPs in the Moffatt et al study,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar 9 were present on the Illumina HapMap550 BeadChip. The missing SNP, rs4795408, was therefore not included in this study. The remaining 9 SNPs had a call rate of greater than 99% and were in Hardy-Weinberg equilibrium (P > 10−5). The results of the allelic association and odds ratios are summarized in Table I. F_A, Allele frequency in patients; F_U, allele frequency in control subjects; OR, odds ratio. We detected significant association at 7 of the 9 SNPs tested (Table I). The most strongly associated marker was SNP rs8067378, which was also found to be the most significantly associated in the Moffatt MRC-A familial cohort. The SNP most highly correlated with ORMDL3 expression that was also the most significantly associated in the Moffatt et al study,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar rs7216389, was significant in our cohort, with an odds ratio of 1.17 compared with a reported odds ratio of 1.84. The odds ratios for the significantly associated SNPs in our cohort ranged between 1.13 and 1.18, placing them within a similar range as the German cohort examined by Moffatt et al (International Study of Asthma and Allergies in Childhood [ISAAC] II replication study lower 95% CI range, 0.89-1.2). Additionally, the 2 SNPs in the interval, rs3894194 and rs3859192, that were not significantly associated with asthma in our cohort were also not significant in the ISAAC II cohort. We additionally examined for association of these 7 SNPs to asthma in an independent case-control cohort of subjects of African American decent. We detected no evidence for association between these markers and asthma in the 1456 patients and 1973 control subjects examined. None of the SNPs had a P value of less than .27 (range: P = .27-.96; odds ratio, 0.93-1.1). This study has replicated the reported association between asthma and variants in and around the ORMDL3 gene in a cohort of North American white asthmatic subjects. Seven of the 9 SNPs that were tested showed significant association. The remaining 2 SNPs, rs3894194 and rs3859192, might have been in higher linkage disequilibrium with a causal SNP in the British cohort studied by Moffatt et al3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google Scholar because the German ISAAC II cohort that was reported in the same study shows the same pattern of association as our data, strongly suggesting that the SNPs are not associated with asthma predisposition. Furthermore, the odds ratios obtained in our cohort are significantly lower than those reported by Moffatt et al, which might be a case of the winner's curse, in which the strength of association is overestimated in initial reports. In contrast, no association was detected between these markers and asthma in African American subjects. Our results are in agreement with another recent study of African American subjects8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar that did not detect association at rs7216389. Galanter et al,8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar did, however, type additional SNPs within ORMDL3, identifying association at 2 variants. We cannot therefore exclude the possibility that the white subject–associated SNPs do not tag causal variants that might be present in African American subjects. In addition to our replication in North American white subjects and the British and German cohorts reported by Moffatt et al,3Moffatt M.F. Kabesch M. Liang L. Dixon A.L. Strachan D. Heath S. et al.Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma.Nature. 2007; 448: 470-473Crossref PubMed Scopus (1239) Google ScholarORMDL3 has also been replicated in 2 other studies. The first was a study that looked at Mexican, Puerto Rican, and African American subjects,8Galanter J. Choudhry S. Eng C. Nazario S. Rodriguez-Santana J.R. Casal J. et al.ORMDL3 gene is associated with asthma in three ethnically diverse populations.Am J Respir Crit Care Med. 2008; 177: 1194-1200Crossref PubMed Scopus (206) Google Scholar and the second was a study that looked at a Scottish cohort.9Tavendale R. Macgregor D.F. Mukhopadhyay S. Palmer C.N. A polymorphism controlling ORMDL3 expression is associated with asthma that is poorly controlled by current medications.J Allergy Clin Immunol. 2008; 121: 860-863Abstract Full Text Full Text PDF PubMed Scopus (131) Google Scholar The weight of evidence across 6 different populations therefore supports the association between variants at the ORMDL3 locus and asthma. We thank all of the participating subjects and families. We thank Elvira Dabaghyan, Kenya Fain, Kisha Harden, Andrew Hill, Crystal Johnson-Honesty, Lynn McCleery, Kathleen Lake, Ramona, Alexandria Thomas, and Robert Skraban for their expert assistance with DNA processing, data collection, or study management. We also thank Smari Kristinsson, Larus Arni Hermannsson, and Asbjörn Krisbjörnsson of Raförninn ehf for their extensive software design and informatics contribution.
DOI: 10.1074/mcp.m110.000398
2010
Cited 76 times
In Situ Proteomic Analysis of Human Breast Cancer Epithelial Cells Using Laser Capture Microdissection: Annotation by Protein Set Enrichment Analysis and Gene Ontology
Identification of molecular signatures that allow detection of the transition from normal breast epithelial cells to malignant invasive cells is a critical component in the development of diagnostic, therapeutic, and preventative strategies for human breast cancer. Substantial efforts have been devoted to deciphering breast cancer etiology at the genome level, but only a limited number of studies have appeared at the proteome level. In this work, we compared individual in situ proteome profiles of nonpatient matched nine noncancerous, normal breast epithelial (NBE) samples with nine estrogen receptor (ER)-positive (luminal subtype), invasive malignant breast epithelial (MBE) samples by combining laser capture microdissection (LCM) and quantitative shotgun proteomics. A total of 12,970 unique peptides were identified from the 18 samples, and 1623 proteins were selected for quantitative analysis using spectral index (SpI) as a measure of protein abundance. A total of 298 proteins were differentially expressed between NBE and MBE at 95% confidence level, and this differential expression correlated well with immunohistochemistry (IHC) results reported in the Human Protein Atlas (HPA) database. To assess pathway level patterns in the observed expression changes, we developed protein set enrichment analysis (PSEA), a modification of a well-known approach in gene expression analysis, Gene Set Enrichment Analysis (GSEA). Unlike single gene-based functional term enrichment analyses that only examines pathway overrepresentation of proteins above a given significance threshold, PSEA applies a weighted running sum statistic to the entire expression data to discover significantly enriched protein groups. Application of PSEA to the expression data in this study revealed not only well-known ER-dependent and cellular morphology-dependent protein abundance changes, but also significant alterations of downstream targets for multiple transcription factors (TFs), suggesting a role for specific gene regulatory pathways in breast tumorigenesis. A parallel GOMiner analysis revealed both confirmatory and complementary data to PSEA. The combination of the two annotation approaches yielded extensive biological feature mapping for in depth analysis of the quantitative proteomic data. Identification of molecular signatures that allow detection of the transition from normal breast epithelial cells to malignant invasive cells is a critical component in the development of diagnostic, therapeutic, and preventative strategies for human breast cancer. Substantial efforts have been devoted to deciphering breast cancer etiology at the genome level, but only a limited number of studies have appeared at the proteome level. In this work, we compared individual in situ proteome profiles of nonpatient matched nine noncancerous, normal breast epithelial (NBE) samples with nine estrogen receptor (ER)-positive (luminal subtype), invasive malignant breast epithelial (MBE) samples by combining laser capture microdissection (LCM) and quantitative shotgun proteomics. A total of 12,970 unique peptides were identified from the 18 samples, and 1623 proteins were selected for quantitative analysis using spectral index (SpI) as a measure of protein abundance. A total of 298 proteins were differentially expressed between NBE and MBE at 95% confidence level, and this differential expression correlated well with immunohistochemistry (IHC) results reported in the Human Protein Atlas (HPA) database. To assess pathway level patterns in the observed expression changes, we developed protein set enrichment analysis (PSEA), a modification of a well-known approach in gene expression analysis, Gene Set Enrichment Analysis (GSEA). Unlike single gene-based functional term enrichment analyses that only examines pathway overrepresentation of proteins above a given significance threshold, PSEA applies a weighted running sum statistic to the entire expression data to discover significantly enriched protein groups. Application of PSEA to the expression data in this study revealed not only well-known ER-dependent and cellular morphology-dependent protein abundance changes, but also significant alterations of downstream targets for multiple transcription factors (TFs), suggesting a role for specific gene regulatory pathways in breast tumorigenesis. A parallel GOMiner analysis revealed both confirmatory and complementary data to PSEA. The combination of the two annotation approaches yielded extensive biological feature mapping for in depth analysis of the quantitative proteomic data. Breast cancer is a major health problem that each year affects the lives of millions of women worldwide. In 2008, in the United States alone, ∼180,000 women were diagnosed with invasive breast carcinoma (1Jemal A. Siegel R. Ward E. Hao Y. Xu J. Murray T. Thun M.J. Cancer statistics, 2008.CA-Cancer J. Clin. 2008; 58: 71-96Crossref PubMed Scopus (10172) Google Scholar). The use of high-throughput gene expression technologies applied to the study of human breast cancer has lead to the discovery of the “intrinsic gene signatures” that stratify human breast cancers into four subtypes that correlate remarkably well with clinically recognized breast cancer subtypes (2Sotiriou C. Wirapati P. Loi S. Harris A. Fox S. Smeds J. Nordgren H. Farmer P. Praz V. Haibe-Kains B. Desmedt C. Larsimont D. Cardoso F. Peterse H. Nuyten D. Buyse M. Van de Vijver M.J. Bergh J. Piccart M. Delorenzi M. Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis.J. Natl. Cancer Inst. 2006; 98: 262-272Crossref PubMed Scopus (1534) Google Scholar, 3Sorlie T. Perou C.M. Tibshirani R. Aas T. Geisler S. Johnsen H. Hastie T. Eisen M.B. van de Rijn M. Jeffrey S.S. Thorsen T. Quist H. Matese J.C. Brown P.O. Botstein D. Lønning P. Eystein Børresen-Dale A.L. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.Proc. Natl. Acad. Sci. U.S.A. 2001; 98: 10869-10874Crossref PubMed Scopus (8426) Google Scholar, 4van't Veer L.J. Dai H. van de Vijver M.J. He Y.D. Hart A.A. Mao M. Peterse H.L. van der Kooy K. Marton M.J. Witteveen A.T. Schreiber G.J. Kerkhoven R.M. Roberts C. Linsley P.S. Bernards R. Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer.Nature. 2002; 415: 530-536Crossref PubMed Scopus (7697) Google Scholar, 5Ma X.J. Wang Z. Ryan P.D. Isakoff S.J. Barmettler A. Fuller A. Muir B. Mohapatra G. Salunga R. Tuggle J.T. Tran Y. Tran D. Tassin A. Amon P. Wang W. Enright E. Stecker K. Estepa-Sabal E. Smith B. Younger J. Balis U. Michaelson J. Bhan A. Habin K. Baer T.M. Brugge J. Haber D.A. Erlander M.G. Sgroi D.C. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.Cancer Cell. 2004; 5: 607-616Abstract Full Text Full Text PDF PubMed Scopus (789) Google Scholar, 6Perou C.M. Sørlie T. Eisen M.B. van de Rijn M. Jeffrey S.S. Rees C.A. Pollack J.R. Ross D.T. Johnsen H. Akslen L.A. Fluge O. Pergamenschikov A. Williams C. Zhu S.X. Lønning P.E. Børresen-Dale A.L. Brown P.O. Botstein D. Molecular portraits of human breast tumours.Nature. 2000; 406: 747-752Crossref PubMed Scopus (11499) Google Scholar). These subtypes include “HER2+,” “basal,” and “luminal A,” “luminal B” breast cancers. HER2+ tumors are most frequently estrogen receptor (ER)- 1The abbreviations used are:ERestrogen receptorEREestrogen response elementESenrichment scoreGOgene ontologyGSEAgene set enrichment analysisHPAHuman Protein AtlasIHCimmunohistochemistryLCMlaser capture microdissectionMBEmalignant breast epitheliumMSigDBthe molecular signatures databaseNBEnormal breast epitheliumPSEAprotein set enrichment analysisRSSrunning sum statisticSpIspectral indexTFtranscription factorFDRfalse discovery rateSRFserum response factorAP-1activator protein 1SREBP1sterol regulatory binding protein 1. 1The abbreviations used are:ERestrogen receptorEREestrogen response elementESenrichment scoreGOgene ontologyGSEAgene set enrichment analysisHPAHuman Protein AtlasIHCimmunohistochemistryLCMlaser capture microdissectionMBEmalignant breast epitheliumMSigDBthe molecular signatures databaseNBEnormal breast epitheliumPSEAprotein set enrichment analysisRSSrunning sum statisticSpIspectral indexTFtranscription factorFDRfalse discovery rateSRFserum response factorAP-1activator protein 1SREBP1sterol regulatory binding protein 1., express proliferation genes, as well as Her-2 and other genes linked to this latter locus. The basal tumors are most commonly ER negative, progesterone receptor negative and Her-2 negative. The luminal A and luminal B tumors express luminal cytokeratins, the estrogen receptor (ER), and trans-acting T-cell-specific transcription factor (GATA3).The luminal breast cancers (both A and B subtypes) constitute ∼70% of all human breast cancers diagnosed worldwide. In general, the luminal breast cancers are associated with favorable prognosis as compared with the HER2+ and basal subtypes. Nevertheless, luminal B tumors have a worse prognosis than luminal A tumors, and recent data suggest that luminal A tumors may be adequately treated with antihormonal therapy alone, whereas luminal B tumors may benefit from chemotherapy added to antihormonal therapy (7Brenton J.D. Carey L.A. Ahmed A.A. Caldas C. Molecular classification and molecular forecasting of breast cancer: Ready for clinical application?.J. Clin. Oncol. 2005; 23: 7350-7360Crossref PubMed Scopus (726) Google Scholar). Despite advances in the gene expression-based stratification of human breast cancer, the molecular basis of luminal breast tumorigenesis and luminal breast cancer clinical heterogeneity is still poorly understood. This gap in knowledge is due, in part, to the well-known limitations associated with gene expression, for it is the gene products, or the proteome, and not the genes themselves that are the biochemical determinants of cell growth and metabolism. Thus, increased knowledge of the proteomic alterations associated with luminal breast cancer tumorigenesis will help advance understanding of human breast cancer and facilitate tailored interventions in select luminal breast cancer patients.Over the past decade, research has been conducted to study the breast cancer proteome to increase the molecular understanding of breast cancer tumorigenesis beyond that from existing breast cancer gene expression data (8Hondermarck H. Breast cancer - When proteomics challenges biological complexity.Mol. Cell. Proteomics. 2003; 2: 281-291Abstract Full Text Full Text PDF PubMed Scopus (68) Google Scholar, 9Somiari R.I. Somiari S. Russell S. Shriver C.D. Proteomics of breast carcinoma.J. Chromatogr. B. 2005; 815: 215-225Crossref PubMed Scopus (88) Google Scholar, 10Bertucci F. Birnbaum D. Goncalves A. Proteomics of breast cancer - Principles and potential clinical applications.Mol. Cell. Proteomics. 2006; 5: 1772-1786Abstract Full Text Full Text PDF PubMed Scopus (72) Google Scholar). Global proteomic analyses of tumor biopsies, dissected cells, human breast milk and nipple aspirate fluid, cancer cell lines, and sera and plasma provide opportunities for unbiased characterization of protein expression in breast cancer (11Hondermarck H. Tastet C. El Yazidi-Belkoura I. Toillon R.A. Le Bourhis X. Proteomics of breast cancer: The quest for markers and therapeutic targets.J. Proteome Res. 2008; 7: 1403-1411Crossref PubMed Scopus (40) Google Scholar, 12Kulasingam V. Diamandis E.P. Tissue culture-based breast cancer biomarker discovery platform.Int. J. Cancer. 2008; 123: 2007-2012Crossref PubMed Scopus (64) Google Scholar). Tumor tissue is likely to be the most informative sample, since proteomic analysis is conducted directly on the sample where the disease resides. However, tumor analysis is challenging, given the heterogeneity of the breast cancer tissue and the limited number of cells generally available.Highly enriched cell populations can be obtained from heterogeneous samples by laser capture microdissection (LCM) (13Emmert-Buck M.R. Bonner R.F. Smith P.D. Chuaqui R.F. Zhuang Z. Goldstein S.R. Weiss R.A. Liotta L.A. Laser Capture Microdissection.Science. 1996; 274: 998-1001Crossref PubMed Scopus (2113) Google Scholar, 14Espina V. Wulfkuhle J.D. Calvert V.S. VanMeter A. Zhou W. Coukos G. Geho D.H. Petricoin 3rd, E.F. Liotta L.A. Laser-capture microdissection.Nat. Protoc. 2006; 1: 586-603Crossref PubMed Scopus (484) Google Scholar). Such sample analysis can lead to detailed proteomic portraits of tumor microenvironments and, importantly, correct for potential confounding effects of stromal contamination (15Ma X.J. Dahiya S. Richardson E. Erlander M. Sgroi D.C. Gene expression profiling of the tumor microenvironment during breast cancer progression.Breast Cancer Res. 2009; 11: R7Crossref PubMed Scopus (489) Google Scholar). Though such analyses are numerous in the world of gene expression studies, proteomic analyses of LCM breast tissue are limited (16Wulfkuhle J.D. Sgroi D.C. Krutzsch H. McLean K. McGarvey K. Knowlton M. Chen S. Shu H. Sahin A. Kurek R. Wallwiener D. Merino M.J. Petricoin 3rd, E.F. Zhao Y. Steeg P.S. Proteomics of human breast ductal carcinoma in situ.Cancer Res. 2002; 62: 6740-6749PubMed Google Scholar, 17Zang L. Palmer Toy D. Hancock W.S. Sgroi D.C. Karger B.L. Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, LC-MS, and O-16/O-18 isotopic labeling.J. Proteome Res. 2004; 3: 604-612Crossref PubMed Scopus (146) Google Scholar, 18Neubauer H. Clare S.E. Kurek R. Fehm T. Wallwiener D. Sotlar K. Nordheim A. Wozny W. Schwall G.P. Poznanović S. Sastri C. Hunzinger C. Stegmann W. Schrattenholz A. Cahill M.A. Breast cancer proteomics by laser capture microdissection, sample pooling, 54-cm IPG IEF, and differential iodine radioisotope detection.Electrophoresis. 2006; 27: 1840-1852Crossref PubMed Scopus (65) Google Scholar, 19Umar A. Luider T.M. Foekens J.A. Pasa-Tolic L. NanoLC-FT-ICR MS improves proteome coverage attainable for similar to 3000 laser-microdissected breast carcinoma cells.Proteomics. 2007; 7: 323-329Crossref PubMed Scopus (57) Google Scholar, 20Sanders M.E. Dias E.C. Xu B.J. Mobley J.A. Billheimer D. Roder H. Grigorieva J. Dowsett M. Arteaga C.L. Caprioli R.M. Differentiating proteomic biomarkers in breast cancer by laser capture microdissection and MALDI MS.J. Proteome Res. 2008; 7: 1500-1507Crossref PubMed Scopus (52) Google Scholar, 21Johann D.J. Rodriguez-Canales J. Mukherjee S. Prieto D.A. Hanson J.C. Emmert-Buck M. Blonder J. Approaching Solid Tumor Heterogeneity on a Cellular Basis by Tissue Proteomics Using Laser Capture Microdissection and Biological Mass Spectrometry.J. Proteome Res. 2009; 8: 2310-2318Crossref PubMed Scopus (70) Google Scholar). Recently, solid tumor heterogeneity at the protein level was assessed by comparing LCM-acquired breast tumor proper and stromal cells from a lymph node containing breast carcinoma (21Johann D.J. Rodriguez-Canales J. Mukherjee S. Prieto D.A. Hanson J.C. Emmert-Buck M. Blonder J. Approaching Solid Tumor Heterogeneity on a Cellular Basis by Tissue Proteomics Using Laser Capture Microdissection and Biological Mass Spectrometry.J. Proteome Res. 2009; 8: 2310-2318Crossref PubMed Scopus (70) Google Scholar), and differential proteome profiles according to tamoxifen therapy responsiveness were analyzed from ER+, primary tumor pooled LCM samples (22Umar A. Kang H. Timmermans A.M. Look M.P. Meijer-van Gelder M.E. den Bakker M.A. Jaitly N. Martens J.W. Luider T.M. Foekens J.A. Pasa-Tolić L. Identification of a Putative Protein Profile Associated with Tamoxifen Therapy Resistance in Breast Cancer.Mol. Cell. Proteomics. 2009; 8: 1278-1294Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar).The combination of LCM enrichment and shotgun proteomics has the potential to reveal novel pathogenic pathways and networks by which normal breast epithelial cells progress to invasive malignant epithelial cells. Genomic and gene expression profiling of human breast cancer progression has already shown potential for the identification of molecular alterations that serve as prognostic and predictive biomarkers (2Sotiriou C. Wirapati P. Loi S. Harris A. Fox S. Smeds J. Nordgren H. Farmer P. Praz V. Haibe-Kains B. Desmedt C. Larsimont D. Cardoso F. Peterse H. Nuyten D. Buyse M. Van de Vijver M.J. Bergh J. Piccart M. Delorenzi M. Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis.J. Natl. Cancer Inst. 2006; 98: 262-272Crossref PubMed Scopus (1534) Google Scholar, 3Sorlie T. Perou C.M. Tibshirani R. Aas T. Geisler S. Johnsen H. Hastie T. Eisen M.B. van de Rijn M. Jeffrey S.S. Thorsen T. Quist H. Matese J.C. Brown P.O. Botstein D. Lønning P. Eystein Børresen-Dale A.L. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.Proc. Natl. Acad. Sci. U.S.A. 2001; 98: 10869-10874Crossref PubMed Scopus (8426) Google Scholar, 4van't Veer L.J. Dai H. van de Vijver M.J. He Y.D. Hart A.A. Mao M. Peterse H.L. van der Kooy K. Marton M.J. Witteveen A.T. Schreiber G.J. Kerkhoven R.M. Roberts C. Linsley P.S. Bernards R. Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer.Nature. 2002; 415: 530-536Crossref PubMed Scopus (7697) Google Scholar, 23Chin K. DeVries S. Fridlyand J. Spellman P.T. Roydasgupta R. Kuo W.L. Lapuk A. Neve R.M. Qian Z. Ryder T. Chen F. Feiler H. Tokuyasu T. Kingsley C. Dairkee S. Meng Z. Chew K. Pinkel D. Jain A. Ljung B.M. Esserman L. Albertson D.G. Waldman F.M. Gray J.W. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies.Cancer Cell. 2006; 10: 529-541Abstract Full Text Full Text PDF PubMed Scopus (1010) Google Scholar). Comprehensive knowledge of proteomic patterns in normal epithelium, as compared with malignant breast epithelium, would be invaluable in providing a more complete picture of the molecular entities associated with breast cancer progression.In the present study, we performed comparative proteomic profiling of laser capture microdissected normal breast epithelium (NBE) from nine different noncancerous human mammoplasty specimens relative to microdissected malignant breast epithelium (MBE) from nine different human invasive luminal (ER positive) breast cancer specimens. Using only the limited amount of material collected by LCM (60,000 cells), a proteomic profile of each sample was separately acquired by label-free proteomics using our previously developed platform for analysis of LCM samples (24Gu Y. Wu S.L. Meyer J.L. Hancock W.S. Burg L.J. Linder J. Hanlon D.W. Karger B.L. Proteomic analysis of high-grade dysplastic cervical cells obtained from ThinPrep slides using laser capture microdissection and mass spectrometry.J. Proteome Res. 2007; 6: 4256-4268Crossref PubMed Scopus (35) Google Scholar). To the best of our knowledge, this is the most comprehensive global study that compares proteomic signatures of phenotypically normal and ER+ invasive malignant breast epithelial cells in multiple individual samples. For each protein, the spectral index (SpI) value (25Fu X. Gharib S.A. Green P.S. Aitken M.L. Frazer D.A. Park D.R. Vaisar T. Heinecke J.W. Spectral index for assessment of differential protein expression in shotgun proteomics.J. Proteome Res. 2008; 7: 845-854Crossref PubMed Scopus (86) Google Scholar) was used as a measure of relative abundance, with differentially abundant proteins between NBE and MBE being identified by employing permutation analysis. Importantly, a number of proteins highly enriched in MBE, including both well-known and novel proteins in the context of breast cancer, were verified to be up-regulated based on immunohistostaining profiles of normal and disease tissues available in the Human Protein Atlas (HPA) (26Berglund L. Björling E. Oksvold P. Fagerberg L. Asplund A. Szigyarto C.A. Persson A. Ottosson J. Wernérus H. Nilsson P. Lundberg E. Sivertsson A. Navani S. Wester K. Kampf C. Hober S. Pontén F. Uhlén M. A gene-centric human protein atlas for expression profiles based on antibodies.Mol. Cell. Proteomics. 2008; 7: 2019-2027Abstract Full Text Full Text PDF PubMed Scopus (492) Google Scholar).A challenge in all global expression profiling studies is to annotate the large number of observed molecular changes. Functional approaches employing DAVID or GOMiner have been used in the analysis of global proteomic profiling studies (27Dennis G. Sherman B.T. Hosack D.A. Yang J. Gao W. Lane H.C. Lempicki R.A. DAVID: Database for annotation, visualization, and integrated discovery.Genome Biol. 2003; 4: R60Crossref Google Scholar, 28Zeeberg B.R. Qin H. Narasimhan S. Sunshine M. Cao H. Kane D.W. Reimers M. Stephens R.M. Bryant D. Burt S.K. Elnekave E. Hari D.M. Wynn T.A. Cunningham-Rundles C. Stewart D.M. Nelson D. Weinstein J.N. High-Throughput GoMiner, an ‘industrial-strength’ integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID).BMC Bioinformatics. 2005; 6: 168Crossref PubMed Scopus (231) Google Scholar). Though effective, these approaches rely on strict quantitative thresholds for selecting proteins to be included and ignore the level of expression differences when scoring enriched functional terms. Gene set enrichment analysis (GSEA) was developed in the gene expression field as a powerful approach for determining functional significance of differential expression results by taking into account the quantitative nature of expression correlation (29Subramanian A. Tamayo P. Mootha V.K. Mukherjee S. Ebert B.L. Gillette M.A. Paulovich A. Pomeroy S.L. Golub T.R. Lander E.S. Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.Proc. Natl. Acad. Sci. U.S.A. 2005; 102: 15545-15550Crossref PubMed Scopus (25165) Google Scholar). Recently, GSEA has been applied, without modification, to proteome datasets to interpret global functional changes in dilated cardiomyopathy (30Ruth, I., Daniele, M., Rasoul, A.-K., Anthony, G., Gary, D. B., Andrew, E., Pathway analysis of dilated cardiomyopathy using global proteomic profiling and enrichment maps. Proteomics, 10, 1316–1327.Google Scholar).In the present study, we modified GSEA to be suitable for the analysis of label-free quantitative proteomic data in an approach termed Protein Set Enrichment Analysis (PSEA). PSEA takes into account the quantitative level of differential protein expression measured by spectral counts to highlight biologically related proteins with strong and highly concordant expression differences. PSEA of LCM-derived MBE samples revealed significant decreased expression of cytoskeletal proteins and of proteins that are consistently negatively correlated with ER+ human breast cancers. More importantly, many protein sets, representing proteins controlled by common transcription factors (TFs) were found, potentially suggesting a role of estrogen response element (ERE)-independent ER signaling in breast tumorigenesis. In addition, we performed a GOMiner analysis, which requires arbitrary thresholds for differentially abundant proteins, on the proteomic results. For differentially expressed proteins at the 95% confidence level, GOMiner analysis added unique biological features such as up-regulation of Golgi vesicle transport activity in MBE. Complementing PSEA with the results of standard GOMiner analysis, we are able to map extensively potential activators and targets for common TFs.CONCLUSIONSComparative proteomic analysis of breast epithelium was performed using a combination of LCM and shotgun proteomics. To our knowledge, this work is the most comprehensive study comparing direct proteome signatures obtained from homogeneous cell populations of phenotypically normal and ER+ invasive malignant breast epithelium. The key features of this study included separate processing of each sample, thus providing a measure of biological variability. Furthermore, consistent proteomic profiles across 18 samples were obtained using a robust shotgun proteomics protocol. Nine biological replicates for each phenotype proved to be a good balance between the statistical significance and time necessary for analysis.Highly MBE-enriched proteins were found to correlate well with tissue IHC profiles deposited in the Human Protein Atlas. This agreement suggest that our proteomics results may not be restricted to ER+ breast cancer samples, but may be generalized to all breast cancer subtypes, given the diversity of breast cancer subtype samples in the HPA database.Global changes in multiple protein signatures, according to breast epithelial malignancy, were functionally assessed by employing PSEA, an annotation method that uses the complete quantitative profiles of all identified proteins with no arbitrary cutoffs. PSEA of ER+ breast epithelium not only found many confirmatory biological findings but also revealed significant changes of downstream targets for a number of common transcription factors. This result suggests that ERE-independent ER signaling could be a key pathway in breast cancer progression. GOMiner analysis with 298 differentially abundant proteins at or above the 95% confidence level provided complementary results to PSEA, and the combination of the two approaches revealed extensive insights into the molecular participants involved in signaling cascades.PSEA is applicable to any quantitative proteome dataset, and the flexibility of PSEA allows exploration of proteome datasets in many different biological contexts. For example, PSEA using the proteome-unique sets, such as protein-protein interaction database, could provide biological insights that are distinguished from those by genome-based annotations. Furthermore, inclusion of all differentially abundant proteins or genes with PSEA will avoid missing possible important biological relationships behind target candidates (e.g. drug targets and biomarkers).Future studies will include correlation analysis between a previously reported transcriptome dataset (33Ma X.J. Salunga R. Tuggle J.T. Gaudet J. Enright E. McQuary P. Payette T. Pistone M. Stecker K. Zhang B.M. Zhou Y.X. Varnholt H. Smith B. Gadd M. Chatfield E. Kessler J. Baer T.M. Erlander M.G. Sgroi D.C. Gene expression profiles of human breast cancer progression.Proc. Natl. Acad. Sci. U.S.A. 2003; 100: 5974-5979Crossref PubMed Scopus (719) Google Scholar) and our proteome dataset, which can lead to significant additional protein networks and provide further insight into the biology of breast cancer progression. In addition, comparing proteome profiles of phenotypically normal epithelium in noncancerous and cancerous breast tissues from the same patient could be interesting, because normal tissue in close proximity to breast carcinoma has been known to lose its heterozygosity (31Deng G. Lu Y. Zlotnikov G. Thor A.D. Smith H.S. Loss of Heterozygosity in Normal Tissue Adjacent to Breast Carcinomas.Science. 1996; 274: 2057-2059Crossref PubMed Scopus (492) Google Scholar). Such findings should broaden our understanding of the molecular boundary between benign and malignant breast diseases. Breast cancer is a major health problem that each year affects the lives of millions of women worldwide. In 2008, in the United States alone, ∼180,000 women were diagnosed with invasive breast carcinoma (1Jemal A. Siegel R. Ward E. Hao Y. Xu J. Murray T. Thun M.J. Cancer statistics, 2008.CA-Cancer J. Clin. 2008; 58: 71-96Crossref PubMed Scopus (10172) Google Scholar). The use of high-throughput gene expression technologies applied to the study of human breast cancer has lead to the discovery of the “intrinsic gene signatures” that stratify human breast cancers into four subtypes that correlate remarkably well with clinically recognized breast cancer subtypes (2Sotiriou C. Wirapati P. Loi S. Harris A. Fox S. Smeds J. Nordgren H. Farmer P. Praz V. Haibe-Kains B. Desmedt C. Larsimont D. Cardoso F. Peterse H. Nuyten D. Buyse M. Van de Vijver M.J. Bergh J. Piccart M. Delorenzi M. Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis.J. Natl. Cancer Inst. 2006; 98: 262-272Crossref PubMed Scopus (1534) Google Scholar, 3Sorlie T. Perou C.M. Tibshirani R. Aas T. Geisler S. Johnsen H. Hastie T. Eisen M.B. van de Rijn M. Jeffrey S.S. Thorsen T. Quist H. Matese J.C. Brown P.O. Botstein D. Lønning P. Eystein Børresen-Dale A.L. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.Proc. Natl. Acad. Sci. U.S.A. 2001; 98: 10869-10874Crossref PubMed Scopus (8426) Google Scholar, 4van't Veer L.J. Dai H. van de Vijver M.J. He Y.D. Hart A.A. Mao M. Peterse H.L. van der Kooy K. Marton M.J. Witteveen A.T. Schreiber G.J. Kerkhoven R.M. Roberts C. Linsley P.S. Bernards R. Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer.Nature. 2002; 415: 530-536Crossref PubMed Scopus (7697) Google Scholar, 5Ma X.J. Wang Z. Ryan P.D. Isakoff S.J. Barmettler A. Fuller A. Muir B. Mohapatra G. Salunga R. Tuggle J.T. Tran Y. Tran D. Tassin A. Amon P. Wang W. Enright E. Stecker K. Estepa-Sabal E. Smith B. Younger J. Balis U. Michaelson J. Bhan A. Habin K. Baer T.M. Brugge J. Haber D.A. Erlander M.G. Sgroi D.C. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen.Cancer Cell. 2004; 5: 607-616Abstract Full Text Full Text PDF PubMed Scopus (789) Google Scholar, 6Perou C.M. Sørlie T. Eisen M.B. van de Rijn M. Jeffrey S.S. Rees C.A. Pollack J.R. Ross D.T. Johnsen H. Akslen L.A. Fluge O. Pergamenschikov A. Williams C. Zhu S.X. Lønning P.E. Børresen-Dale A.L. Brown P.O. Botstein D. Molecular portraits of human breast tumours.Nature
DOI: 10.1038/onc.2013.581
2014
Cited 74 times
Oncogenic RIT1 mutations in lung adenocarcinoma
Lung adenocarcinoma is comprised of distinct mutational subtypes characterized by mutually exclusive oncogenic mutations in RTK/RAS pathway members KRAS, EGFR, BRAF and ERBB2, and translocations involving ALK, RET and ROS1. Identification of these oncogenic events has transformed the treatment of lung adenocarcinoma via application of therapies targeted toward specific genetic lesions in stratified patient populations. However, such mutations have been reported in only ∼55% of lung adenocarcinoma cases in the United States, suggesting other mechanisms of malignancy are involved in the remaining cases. Here we report somatic mutations in the small GTPase gene RIT1 in ∼2% of lung adenocarcinoma cases that cluster in a hotspot near the switch II domain of the protein. RIT1 switch II domain mutations are mutually exclusive with all other known lung adenocarcinoma driver mutations. Ectopic expression of mutated RIT1 induces cellular transformation in vitro and in vivo, which can be reversed by combined PI3K and MEK inhibition. These data identify RIT1 as a driver oncogene in a specific subset of lung adenocarcinomas and suggest PI3K and MEK inhibition as a potential therapeutic strategy in RIT1-mutated tumors.
DOI: 10.1038/oby.2009.53
2009
Cited 74 times
Investigation of the Locus Near <i>MC4R</i> With Childhood Obesity in Americans of European and African Ancestry
Recently a modest, but consistently, replicated association was demonstrated between obesity and the single-nucleotide polymorphism (SNP), rs17782313, 3' of the MC4R locus as a consequence of a meta-analysis of genome-wide association (GWA) studies of the disease in white populations. We investigated the association in the context of the childhood form of the disease utilizing data from our ongoing GWA study in a cohort of 728 European-American (EA) obese children (BMI > or =95th percentile) and 3,960 EA controls (BMI <95th percentile), as well as 1,008 African-American (AA) obese children and 2,715 AA controls. rs571312, rs10871777, and rs476828 (perfect surrogates for rs17782313) yielded odds ratios in the EA cohort of 1.142 (P = 0.045), 1.137 (P = 0.054), and 1.145 (P = 0.042); however, there was no significant association with these SNPs in the AA cohort. When investigating all 30 SNPs present on the Illumina BeadChip at this locus, again there was no evidence for association in AA cases when correcting for the number of tests employed. As such, variants 3' to the MC4R locus present on the genotyping platform utilized confer a similar magnitude of risk of obesity in white children as to their adult white counterparts but this observation did not extend to AAs.
DOI: 10.1016/j.jaci.2009.05.047
2009
Cited 73 times
17q12-21 variants interact with smoke exposure as a risk factor for pediatric asthma but are equally associated with early-onset versus late-onset asthma in North Americans of European ancestry
DOI: 10.1371/journal.pone.0015463
2010
Cited 67 times
Duplication of the SLIT3 Locus on 5q35.1 Predisposes to Major Depressive Disorder
Major depressive disorder (MDD) is a common psychiatric and behavioral disorder. To discover novel variants conferring risk to MDD, we conducted a whole-genome scan of copy number variation (CNV), including 1,693 MDD cases and 4,506 controls genotyped on the Perlegen 600K platform. The most significant locus was observed on 5q35.1, harboring the SLIT3 gene (P = 2×10(-3)). Extending the controls with 30,000 subjects typed on the Illumina 550 k array, we found the CNV to remain exclusive to MDD cases (P = 3.2×10(-9)). Duplication was observed in 5 unrelated MDD cases encompassing 646 kb with highly similar breakpoints. SLIT3 is integral to repulsive axon guidance based on binding to Roundabout receptors. Duplication of 5q35.1 is a highly penetrant variation accounting for 0.7% of the subset of 647 cases harboring large CNVs, using a threshold of a minimum of 10 SNPs and 100 kb. This study leverages a large dataset of MDD cases and controls for the analysis of CNVs with matched platform and ethnicity. SLIT3 duplication is a novel association which explains a definitive proportion of the largely unknown etiology of MDD.
DOI: 10.1073/pnas.1412228112
2014
Cited 49 times
Genetic modifiers of EGFR dependence in non-small cell lung cancer
Lung adenocarcinomas harboring activating mutations in the epidermal growth factor receptor (EGFR) represent a common molecular subset of non-small cell lung cancer (NSCLC) cases. EGFR mutations predict sensitivity to EGFR tyrosine kinase inhibitors (TKIs) and thus represent a dependency in NSCLCs harboring these alterations, but the genetic basis of EGFR dependence is not fully understood. Here, we applied an unbiased, ORF-based screen to identify genetic modifiers of EGFR dependence in EGFR-mutant NSCLC cells. This approach identified 18 kinase and kinase-related genes whose overexpression can substitute for EGFR in EGFR-dependent PC9 cells, and these genes include seven of nine Src family kinase genes, FGFR1, FGFR2, ITK, NTRK1, NTRK2, MOS, MST1R, and RAF1. A subset of these genes can complement loss of EGFR activity across multiple EGFR-dependent models. Unbiased gene-expression profiling of cells overexpressing EGFR bypass genes, together with targeted validation studies, reveals EGFR-independent activation of the MEK-ERK and phosphoinositide 3-kinase (PI3K)-AKT pathways. Combined inhibition of PI3K-mTOR and MEK restores EGFR dependence in cells expressing each of the 18 EGFR bypass genes. Together, these data uncover a broad spectrum of kinases capable of overcoming dependence on EGFR and underscore their convergence on the PI3K-AKT and MEK-ERK signaling axes in sustaining EGFR-independent survival.
DOI: 10.1101/161562
2017
Cited 49 times
The evolutionary history of 2,658 cancers
Summary Cancer develops through a process of somatic evolution. Here, we use whole-genome sequencing of 2,778 tumour samples from 2,658 donors to reconstruct the life history, evolution of mutational processes, and driver mutation sequences of 39 cancer types. The early phases of oncogenesis are driven by point mutations in a small set of driver genes, often including biallelic inactivation of tumour suppressors. Early oncogenesis is also characterised by specific copy number gains, such as trisomy 7 in glioblastoma or isochromosome 17q in medulloblastoma. By contrast, increased genomic instability, a nearly four-fold diversification of driver genes, and an acceleration of point mutation processes are features of later stages. Copy-number alterations often occur in mitotic crises leading to simultaneous gains of multiple chromosomal segments. Timing analysis suggests that driver mutations often precede diagnosis by many years, and in some cases decades, providing a window of opportunity for early cancer detection.
DOI: 10.1101/833590
2019
Cited 37 times
Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure
Abstract Higher-order chromatin structure arises from the combinatorial physical interactions of many genomic loci. To investigate this aspect of genome architecture we developed Pore-C, which couples chromatin conformation capture with Oxford Nanopore Technologies (ONT) long reads to directly sequence multi-way chromatin contacts without amplification. In GM12878, we demonstrate that the pairwise interaction patterns implicit in Pore-C multi-way contacts are consistent with gold standard Hi-C pairwise contact maps at the compartment, TAD, and loop scales. In addition, Pore-C also detects higher-order chromatin structure at 18.5-fold higher efficiency and greater fidelity than SPRITE, a previously published higher-order chromatin profiling technology. We demonstrate Pore-C’s ability to detect and visualize multi-locus hubs associated with histone locus bodies and active / inactive nuclear compartments in GM12878. In the breast cancer cell line HCC1954, Pore-C contacts enable the reconstruction of complex and aneuploid rearranged alleles spanning multiple megabases and chromosomes. Finally, we apply Pore-C to generate a chromosome scale de novo assembly of the HG002 genome. Our results establish Pore-C as the most simple and scalable assay for the genome-wide assessment of combinatorial chromatin interactions, with additional applications for cancer rearrangement reconstruction and de novo genome assembly.
DOI: 10.1038/s41467-020-14351-8
2020
Cited 37 times
Inferring structural variant cancer cell fraction
Abstract We present SVclone, a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole-genome sequencing data. SVclone accurately determines the variant allele frequencies of both SV breakends, then simultaneously estimates the cancer cell fraction and SV copy number. We assess performance using in silico mixtures of real samples, at known proportions, created from two clonal metastases from the same patient. We find that SVclone’s performance is comparable to single-nucleotide variant-based methods, despite having an order of magnitude fewer data points. As part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we use SVclone to reveal a subset of liver, ovarian and pancreatic cancers with subclonally enriched copy-number neutral rearrangements that show decreased overall survival. SVclone enables improved characterisation of SV intra-tumour heterogeneity.
DOI: 10.1038/s41467-020-14352-7
2020
Cited 37 times
Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig
Abstract The type and genomic context of cancer mutations depend on their causes. These causes have been characterized using signatures that represent mutation types that co-occur in the same tumours. However, it remains unclear how mutation processes change during cancer evolution due to the lack of reliable methods to reconstruct evolutionary trajectories of mutational signature activity. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we present TrackSig, a new method that reconstructs these trajectories using optimal, joint segmentation and deconvolution of mutation type and allele frequencies from a single tumour sample. In simulations, we find TrackSig has a 3–5% activity reconstruction error, and 12% false detection rate. It outperforms an aggressive baseline in situations with branching evolution, CNA gain, and neutral mutations. Applied to data from 2658 tumours and 38 cancer types, TrackSig permits pan-cancer insight into evolutionary changes in mutational processes.
DOI: 10.1038/s41467-022-29767-7
2022
Cited 14 times
Somatic whole genome dynamics of precancer in Barrett’s esophagus reveals features associated with disease progression
While the genomes of normal tissues undergo dynamic changes over time, little is understood about the temporal-spatial dynamics of genomes in premalignant tissues that progress to cancer compared to those that remain cancer-free. Here we use whole genome sequencing to contrast genomic alterations in 427 longitudinal samples from 40 patients with stable Barrett's esophagus compared to 40 Barrett's patients who progressed to esophageal adenocarcinoma (ESAD). We show the same somatic mutational processes are active in Barrett's tissue regardless of outcome, with high levels of mutation, ESAD gene and focal chromosomal alterations, and similar mutational signatures. The critical distinction between stable Barrett's versus those who progress to cancer is acquisition and expansion of TP53-/- cell populations having complex structural variants and high-level amplifications, which are detectable up to six years prior to a cancer diagnosis. These findings reveal the timing of common somatic genome dynamics in stable Barrett's esophagus and define key genomic features specific to progression to esophageal adenocarcinoma, both of which are critical for cancer prevention and early detection strategies.
DOI: 10.1158/2159-8290.cd-21-1514
2022
Cited 14 times
SETD2 Haploinsufficiency Enhances Germinal Center–Associated AICDA Somatic Hypermutation to Drive B-cell Lymphomagenesis
SETD2 is the sole histone methyltransferase responsible for H3K36me3, with roles in splicing, transcription initiation, and DNA damage response. Homozygous disruption of SETD2 yields a tumor suppressor effect in various cancers. However, SETD2 mutation is typically heterozygous in diffuse large B-cell lymphomas. Here we show that heterozygous Setd2 deficiency results in germinal center (GC) hyperplasia and increased competitive fitness, with reduced DNA damage checkpoint activity and apoptosis, resulting in accelerated lymphomagenesis. Impaired DNA damage sensing in Setd2-haploinsufficient germinal center B (GCB) and lymphoma cells associated with increased AICDA-induced somatic hypermutation, complex structural variants, and increased translocations including those activating MYC. DNA damage was selectively increased on the nontemplate strand, and H3K36me3 loss was associated with greater RNAPII processivity and mutational burden, suggesting that SETD2-mediated H3K36me3 is required for proper sensing of cytosine deamination. Hence, Setd2 haploinsufficiency delineates a novel GCB context-specific oncogenic pathway involving defective epigenetic surveillance of AICDA-mediated effects on transcribed genes.Our findings define a B cell-specific oncogenic effect of SETD2 heterozygous mutation, which unleashes AICDA mutagenesis of nontemplate strand DNA in the GC reaction, resulting in lymphomas with heavy mutational burden. GC-derived lymphomas did not tolerate SETD2 homozygous deletion, pointing to a novel context-specific therapeutic vulnerability. This article is highlighted in the In This Issue feature, p. 1599.
DOI: 10.1038/s41467-022-31055-3
2022
Cited 14 times
Recurrent somatic mutations as predictors of immunotherapy response
Immune checkpoint blockade (ICB) has transformed the treatment of metastatic cancer but is hindered by variable response rates. A key unmet need is the identification of biomarkers that predict treatment response. To address this, we analyzed six whole exome sequencing cohorts with matched disease outcomes to identify genes and pathways predictive of ICB response. To increase detection power, we focus on genes and pathways that are significantly mutated following correction for epigenetic, replication timing, and sequence-based covariates. Using this technique, we identify several genes (BCLAF1, KRAS, BRAF, and TP53) and pathways (MAPK signaling, p53 associated, and immunomodulatory) as predictors of ICB response and develop the Cancer Immunotherapy Response CLassifiEr (CIRCLE). Compared to tumor mutational burden alone, CIRCLE led to superior prediction of ICB response with a 10.5% increase in sensitivity and a 11% increase in specificity. We envision that CIRCLE and more broadly the analysis of recurrently mutated cancer genes will pave the way for better prognostic tools for cancer immunotherapy.
DOI: 10.1038/s41586-022-05599-9
2023
Cited 6 times
Author Correction: Analyses of non-coding somatic drivers in 2,658 cancer whole genomes
DOI: 10.1038/s41588-023-01540-6
2023
Cited 5 times
Most large structural variants in cancer genomes can be detected without long reads
Short-read sequencing is the workhorse of cancer genomics yet is thought to miss many structural variants (SVs), particularly large chromosomal alterations. To characterize missing SVs in short-read whole genomes, we analyzed 'loose ends'-local violations of mass balance between adjacent DNA segments. In the landscape of loose ends across 1,330 high-purity cancer whole genomes, most large (>10-kb) clonal SVs were fully resolved by short reads in the 87% of the human genome where copy number could be reliably measured. Some loose ends represent neotelomeres, which we propose as a hallmark of the alternative lengthening of telomeres phenotype. These pan-cancer findings were confirmed by long-molecule profiles of 38 breast cancer and melanoma cases. Our results indicate that aberrant homologous recombination is unlikely to drive the majority of large cancer SVs. Furthermore, analysis of mass balance in short-read whole genome data provides a surprisingly complete picture of cancer chromosomal structure.
DOI: 10.1074/mcp.m111.014910
2012
Cited 45 times
Integrated Proteomic, Transcriptomic, and Biological Network Analysis of Breast Carcinoma Reveals Molecular Features of Tumorigenesis and Clinical Relapse
Gene and protein expression changes observed with tumorigenesis are often interpreted independently of each other and out of context of biological networks. To address these limitations, this study examined several approaches to integrate transcriptomic and proteomic data with known protein-protein and signaling interactions in estrogen receptor positive (ER+) breast cancer tumors. An approach that built networks from differentially expressed proteins and identified among them networks enriched in differentially expressed genes yielded the greatest success. This method identified a set of genes and proteins linking pathways of cellular stress response, cancer metabolism, and tumor microenvironment. The proposed network underscores several biologically intriguing events not previously studied in the context of ER+ breast cancer, including the overexpression of p38 mitogen-activated protein kinase and the overexpression of poly(ADP-ribose) polymerase 1. A gene-based expression signature biomarker built from this network was significantly predictive of clinical relapse in multiple independent cohorts of ER+ breast cancer patients, even after correcting for standard clinicopathological variables. The results of this study demonstrate the utility and power of an integrated quantitative proteomic, transcriptomic, and network analysis approach to discover robust and clinically meaningful molecular changes in tumors.
DOI: 10.1101/181339
2017
Cited 37 times
Patterns of structural variation in human cancer
ABSTRACT A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments ranging in size from kilobases to whole chromosomes. We developed methods to group, classify and describe structural variants, applied to &gt;2,500 cancer genomes. Nine signatures of structural variation emerged. Deletions have trimodal size distribution; assort unevenly across tumour types and patients; enrich in late-replicating regions; and correlate with inversions. Tandem duplications also have trimodal size distribution, but enrich in early-replicating regions, as do unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy number gains and frequent inverted rearrangements. One prominent structure consists of 1-7 templates copied from distinct regions of the genome strung together within one locus. Such ‘cycles of templated insertions’ correlate with tandem duplications, frequently activating the telomerase gene, TERT, in liver cancer. Cancers access many rearrangement processes, flexibly sculpting the genome to maximise oncogenic potential.
DOI: 10.1101/2022.01.17.476508
2022
Cited 11 times
Machine learning guided signal enrichment for ultrasensitive plasma tumor burden monitoring
ABSTRACT In solid tumor oncology, circulating tumor DNA (ctDNA) is poised to transform care through accurate assessment of minimal residual disease (MRD) and therapeutic response monitoring. To overcome the sparsity of ctDNA fragments in low tumor fraction (TF) settings and increase MRD sensitivity, we previously leveraged genome-wide mutational integration through plasma whole genome sequencing (WGS). We now introduce MRD-EDGE, a composite machine learning-guided WGS ctDNA single nucleotide variant (SNV) and copy number variant (CNV) detection platform designed to increase signal enrichment. MRD-EDGE uses deep learning and a ctDNA-specific feature space to increase SNV signal to noise enrichment in WGS by 300X compared to our previous noise suppression platform MRDetect. MRD-EDGE also reduces the degree of aneuploidy needed for ultrasensitive CNV detection through WGS from 1Gb to 200Mb, thereby expanding its applicability to a wider range of solid tumors. We harness the improved performance to track changes in tumor burden in response to neoadjuvant immunotherapy in non-small cell lung cancer and demonstrate ctDNA shedding in precancerous colorectal adenomas. Finally, the radical signal to noise enrichment in MRD-EDGE enables de novo mutation calling in melanoma without matched tumor, yielding clinically informative TF monitoring for patients on immune checkpoint inhibition.
DOI: 10.1186/1752-0509-2-40
2008
Cited 42 times
Exploiting the pathway structure of metabolism to reveal high-order epistasis
Biological robustness results from redundant pathways that achieve an essential objective, e.g. the production of biomass. As a consequence, the biological roles of many genes can only be revealed through multiple knockouts that identify a set of genes as essential for a given function. The identification of such "epistatic" essential relationships between network components is critical for the understanding and eventual manipulation of robust systems-level phenotypes.We introduce and apply a network-based approach for genome-scale metabolic knockout design. We apply this method to uncover over 11,000 minimal knockouts for biomass production in an in silico genome-scale model of E. coli. A large majority of these "essential sets" contain 5 or more reactions, and thus represent complex epistatic relationships between components of the E. coli metabolic network.The complex minimal biomass knockouts discovered with our approach illuminate robust essential systems-level roles for reactions in the E. coli metabolic network. Unlike previous approaches, our method yields results regarding high-order epistatic relationships and is applicable at the genome-scale.
DOI: 10.1101/187609
2017
Cited 29 times
Selective and mechanistic sources of recurrent rearrangements across the cancer genome
Abstract Cancer cells can acquire profound alterations to the structure of their genomes, including rearrangements that fuse distant DNA breakpoints. We analyze the distribution of somatic rearrangements across the cancer genome, using whole-genome sequencing data from 2,693 tumor-normal pairs. We observe substantial variation in the density of rearrangement breakpoints, with enrichment in open chromatin and sites with high densities of repetitive elements. After accounting for these patterns, we identify significantly recurrent breakpoints (SRBs) at 52 loci, including novel SRBs near BRD4 and AKR1C3 . Taking into account both loci fused by a rearrangement, we observe different signatures resembling either single breaks followed by strand invasion or two separate breaks that become joined. Accounting for these signatures, we identify 90 pairs of loci that are significantly recurrently juxtaposed (SRJs). SRJs are primarily tumor-type specific and tend to involve genes with tissue-specific expression. SRJs were frequently associated with disruption of topology-associated domains, juxtaposition of enhancer elements, and increased expression of neighboring genes. Lastly, we find that the power to detect SRJs decreases for short rearrangements, and that reliable detection of all driver SRJs will require whole-genome sequencing data from an order of magnitude more cancer samples than currently available.
DOI: 10.1101/312041
2018
Cited 28 times
Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes
SUMMARY Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin and drivers of ITH across cancer types are poorly understood. To address this question, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples, spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions, with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types, and identify cancer type specific subclonal patterns of driver gene mutations, fusions, structural variants and copy-number alterations, as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution, and provide an unprecedented pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
DOI: 10.1016/j.celrep.2021.108707
2021
Cited 16 times
Whole-genome characterization of lung adenocarcinomas lacking alterations in the RTK/RAS/RAF pathway
RTK/RAS/RAF pathway alterations (RPAs) are a hallmark of lung adenocarcinoma (LUAD). In this study, we use whole-genome sequencing (WGS) of 85 cases found to be RPA(-) by previous studies from The Cancer Genome Atlas (TCGA) to characterize the minority of LUADs lacking apparent alterations in this pathway. We show that WGS analysis uncovers RPA(+) in 28 (33%) of the 85 samples. Among the remaining 57 cases, we observe focal deletions targeting the promoter or transcription start site of STK11 (n = 7) or KEAP1 (n = 3), and promoter mutations associated with the increased expression of ILF2 (n = 6). We also identify complex structural variations associated with high-level copy number amplifications. Moreover, an enrichment of focal deletions is found in TP53 mutant cases. Our results indicate that RPA(-) cases demonstrate tumor suppressor deletions and genome instability, but lack unique or recurrent genetic lesions compensating for the lack of RPAs. Larger WGS studies of RPA(-) cases are required to understand this important LUAD subset.
DOI: 10.1038/s41467-021-21933-7
2021
Cited 16 times
Structural variant evolution after telomere crisis
Abstract Telomere crisis contributes to cancer genome evolution, yet only a subset of cancers display breakage-fusion-bridge (BFB) cycles and chromothripsis, hallmarks of experimental telomere crisis identified in previous studies. We examine the spectrum of structural variants (SVs) instigated by natural telomere crisis. Eight spontaneous post-crisis clones did not show prominent patterns of BFB cycles or chromothripsis. Their crisis-induced genome rearrangements varied from infrequent simple SVs to more frequent and complex SVs. In contrast, BFB cycles and chromothripsis occurred in MRC5 fibroblast clones that escaped telomere crisis after CRISPR-controlled telomerase activation. This system revealed convergent evolutionary lineages altering one allele of chromosome 12p, where a short telomere likely predisposed to fusion. Remarkably, the 12p chromothripsis and BFB events were stabilized by independent fusions to chromosome 21. The data establish that telomere crisis can generate a wide spectrum of SVs implying that a lack of BFB patterns and chromothripsis in cancer genomes does not indicate absence of past telomere crisis.
DOI: 10.1038/s41586-023-06461-2
2023
Cited 3 times
Long-molecule scars of backup DNA repair in BRCA1- and BRCA2-deficient cancers
Homologous recombination (HR) deficiency is associated with DNA rearrangements and cytogenetic aberrations1. Paradoxically, the types of DNA rearrangements that are specifically associated with HR-deficient cancers only minimally affect chromosomal structure2. Here, to address this apparent contradiction, we combined genome-graph analysis of short-read whole-genome sequencing (WGS) profiles across thousands of tumours with deep linked-read WGS of 46 BRCA1- or BRCA2-mutant breast cancers. These data revealed a distinct class of HR-deficiency-enriched rearrangements called reciprocal pairs. Linked-read WGS showed that reciprocal pairs with identical rearrangement orientations gave rise to one of two distinct chromosomal outcomes, distinguishable only with long-molecule data. Whereas one (cis) outcome corresponded to the copying and pasting of a small segment to a distant site, a second (trans) outcome was a quasi-balanced translocation or multi-megabase inversion with substantial (10 kb) duplications at each junction. We propose an HR-independent replication-restart repair mechanism to explain the full spectrum of reciprocal pair outcomes. Linked-read WGS also identified single-strand annealing as a repair pathway that is specific to BRCA2 deficiency in human cancers. Integrating these features in a classifier improved discrimination between BRCA1- and BRCA2-deficient genomes. In conclusion, our data reveal classes of rearrangements that are specific to BRCA1 or BRCA2 deficiency as a source of cytogenetic aberrations in HR-deficient cells.
DOI: 10.1093/bioinformatics/bti245
2005
Cited 42 times
Investigating metabolite essentiality through genome-scale analysis of Escherichia coli production capabilities
A phenotype mechanism is classically derived through the study of a set of mutants and comparison of their biochemical capabilities. One method of comparing mutant capabilities is to characterize producible and knocked out metabolites. However such an effect is difficult to manually assess, especially for a large biochemical network and a complex media. Current algorithmic approaches towards analyzing metabolic networks either do not address this specific property or are computationally infeasible on the genome-scale.We have developed a novel genome-scale computational approach that identifies the full set of biochemical species that are knocked out from the metabolome following a gene deletion. Results from this approach are combined with data from in vivo mutant screens to examine the essentiality of metabolite production for a phenotype. This approach can also be a useful tool for metabolic network annotation validation and refinement in newly sequenced organisms. Combining an in silico genome-scale model of Escherichia coli metabolism with in vivo survival data, we uncover possible essential roles for several cell membranes, cell walls, and quinone species. We also identify specific biomass components whose production appears to be non-essential for survival, contrary to the assumptions of previous models.Programs are available upon request from the authors in the form of Matlab script files.http://www.cis.upenn.edu/biocomp/manuscripts/bioinformatics_bti245/supp-info.html.
DOI: 10.1529/biophysj.105.069278
2006
Cited 39 times
Systematic Analysis of Conservation Relations in Escherichia coli Genome-Scale Metabolic Network Reveals Novel Growth Media
A biochemical species is called producible in a constraints-based metabolic model if a feasible steady-state flux configuration exists that sustains its nonzero concentration during growth. Extreme semipositive conservation relations (ESCRs) are the simplest semipositive linear combinations of species concentrations that are invariant to all metabolic flux configurations. In this article, we outline a fundamental relationship between the ESCRs of a metabolic network and the producibility of a biochemical species under a nutrient media. We exploit this relationship in an algorithm that systematically enumerates all minimal nutrient sets that render an objective species weakly producible (i.e., producible in the absence of thermodynamic constraints) through a simple traversal of ESCRs. We apply our results to a recent genome scale model of Escherichia coli metabolism, in which we traverse the 51 anhydrous ESCRs of the metabolic network to determine all 928 minimal aqueous nutrient media that render biomass weakly producible. Applying irreversibility constraints, we find 287 of these 928 nutrient sets to be thermodynamically feasible. We also find that an additional 365 of these nutrient sets are thermodynamically feasible in the presence of oxygen. Since biomass producibility is commonly used as a surrogate for growth in genome scale metabolic models, our results represent testable hypotheses of alternate growth media derived from in silico analysis of the E. coli genome scale metabolic network.
DOI: 10.1049/iet-syb:20060035
2007
Cited 37 times
Analysis of lactose metabolism in E.coli using reachability analysis of hybrid systems
We propose an abstraction method for medium-scale biomolecular networks, based on hybrid dynamical systems with continuous multi-affine dynamics. This abstraction method follows naturally from the notion of approximating nonlinear rate laws with continuous piecewise linear functions and can be easily automated. An efficient reachability algorithm is possible for the resulting class of hybrid systems. An efficient reachability algorithm is possible for the resulting class of hybrid systems. An approximation for an ordinary differential equation model of the lac operon is constructed, and it is shown that the abstraction passes the same experimental tests as were used to validate the original model. The well studied biological system exhibits bistability and switching behaviour, arising from positive feedback in the expression mechanism of the lac operon. The switching property of the lac system is an example of the major qualitative features that are the building blocks of higher level, more coarse-grained descriptions. The present approach is useful in helping to correctly identify such properties and in connecting them to the underlying molecular dynamical details. Reachability analysis together with the knowledge of the steady-state structure are used to identify ranges of parameter values for which the system maintains the bistable switching property.
DOI: 10.1016/j.ccell.2017.11.008
2017
Cited 20 times
High-throughput Phenotyping of Lung Cancer Somatic Mutations
(Cancer Cell 30, 214–228; August 8, 2016) The authors wish to clarify the specific identities of alleles tested in the multiplexed tumor formation assay (TumorPlex) described in Figure 5. As described in the main text, several known oncogenic alleles were intentionally omitted from the assay to avoid jackpot effects possible from strong alleles in a pooled screening format. These alleles are appropriately omitted from inclusion in Table S6, which accurately documents the complete results from the tumor screen. However, in the original submission, the results summary in Table S7, column F (labeled “TumorPlex hit…”), annotated these intentionally omitted alleles as “NO,” possibly leading some readers to conclude that they were tested and did not score. In addition, several other alleles were not tested due to technical reasons such as infection failure. To provide clarity, the authors are now supplying an updated Table S7 to indicate the alleles that were not assayed. They have now updated these entries as “n.d.” (not determined). This change does not alter the main text or the conclusions of the work. The authors apologize for any confusion this may have caused. High-throughput Phenotyping of Lung Cancer Somatic MutationsBerger et al.Cancer CellJuly 28, 2016In BriefBerger et al. develop an expression-based variant-impact phenotyping method to distinguish impactful from neutral somatic mutations. The method identified rare gain-of-function mutations in oncogenes and widespread inactivation of tumor suppressors by missense variation. Variants of ARAF, BRAF, EGFR, ERBB2, KRAS, and RIT1 are shown to be oncogenic and to induce MEK-dependent resistance to EGFR inhibition. Full-Text PDF Open Archive
DOI: 10.1101/2021.03.08.434433
2021
Cited 13 times
Systemic Tissue and Cellular Disruption from SARS-CoV-2 Infection revealed in COVID-19 Autopsies and Spatial Omics Tissue Maps
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus has infected over 115 million people and caused over 2.5 million deaths worldwide. Yet, the molecular mechanisms underlying the clinical manifestations of COVID-19, as well as what distinguishes them from common seasonal influenza virus and other lung injury states such as Acute Respiratory Distress Syndrome (ARDS), remains poorly understood. To address these challenges, we combined transcriptional profiling of 646 clinical nasopharyngeal swabs and 39 patient autopsy tissues, matched with spatial protein and expression profiling (GeoMx) across 357 tissue sections. These results define both body-wide and tissue-specific (heart, liver, lung, kidney, and lymph nodes) damage wrought by the SARS-CoV-2 infection, evident as a function of varying viral load (high vs. low) during the course of infection and specific, transcriptional dysregulation in splicing isoforms, T cell receptor expression, and cellular expression states. In particular, cardiac and lung tissues revealed the largest degree of splicing isoform switching and cell expression state loss. Overall, these findings reveal a systemic disruption of cellular and transcriptional pathways from COVID-19 across all tissues, which can inform subsequent studies to combat the mortality of COVID-19, as well to better understand the molecular dynamics of lethal SARS-CoV-2 infection and other viruses.
DOI: 10.1016/j.coisb.2016.12.005
2017
Cited 17 times
Modeling cancer rearrangement landscapes
Cancer genome sequences contain footprints of somatic mutational processes, whose analysis in large tumor sequencing datasets has revealed novel mutational signatures, correlative features of variant topography, and complex events. Many of these analytic results have yet to reconciled with decades of mechanistic genome integrity research performed in controlled model systems. However, a new generation of genome-integrity experiments combining computational modeling, data analytics, and high-throughput sequencing are emerging to link mechanisms to patterns. Conversely, analytic studies evaluating quantitative footprints of specific genome integrity hypotheses will be critical in fitting naturally occurring mutational patterns to the predictions of a particular mechanistic model. Such quantitative and mechanistic studies will form the foundation of an emerging systems biology of genome integrity.
DOI: 10.1016/j.celrep.2021.108784
2021
Cited 11 times
Whole-genome characterization of lung adenocarcinomas lacking alterations in the RTK/RAS/RAF pathway
(Cell Reports 34, 108707-1–108707-10.e1–e3; February 2, 2021) In the originally published version of this article, the title was incorrectly listed as “Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway.” It now appears correctly here and with the article online. The production team regrets this error. Whole-genome characterization of lung adenocarcinomas lacking alterations in the RTK/RAS/RAF pathwayCarrot-Zhang et al.Cell ReportsFebruary 02, 2021In BriefCarrot-Zhang et al. perform whole-genome characterization of lung adenocarcinomas (LUADs) lacking RTK/RAS/RAF pathway alterations (RPAs) and identify mutations or structural variants in both coding and non-coding spaces that define a unique entity of RPA(−) LUADs and potentially explain the underlying biology of this disease. Full-Text PDF Open Access
DOI: 10.1038/s41586-022-05598-w
2023
Author Correction: Pan-cancer analysis of whole genomes
DOI: 10.1038/s41588-023-01315-z
2023
Author Correction: Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing
DOI: 10.1038/s41588-023-01319-9
2023
Author Correction: Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition
DOI: 10.1158/1538-7445.dnarepair24-ia009
2024
Abstract IA009: Scars of faulty DNA repair in cancer whole genomes
Abstract Cancer genomes provide a record of the genetic alterations acquired from DNA damage and DNA repair defects during normal cell development and carcinogenesis. Genome-wide somatic alteration patterns in BRCA1-deficient (BRCA1d) and BRCA2-deficient (BRCA2d) cancers are attributed to a deficiency in homologous recombination (HR), a major pathway for the repair of double-strand breaks (DSBs) in human cells. Some of these mutational patterns may reflect specific error-prone repair mechanisms of DSB repair that cells use in the absence of HR. Such mutational patterns can provide biomarkers of HR-deficiency and help identify clinically relevant therapeutic vulnerabilities. HRD is has been associated with specific DNA rearrangements and cytogenetic aberrations. Paradoxically, the types of DNA rearrangements specifically associated with HR-deficient (HRD) cancers only minimally impact chromosomal structure. Addressing this, we combined a genome graph analysis of short-read whole genome sequencing (WGS) profiles across thousands of tumors with deep linked-read (LR) WGS of 46 BRCA1- or BRCA2-mutant breast cancers to discover a distinct class of HR deficiency-enriched rearrangements called reciprocal pairs. LR WGS showed that reciprocal pairs with identical rearrangement orientations gave rise to one of two distinct chromosomal outcomes, distinguishable only with long molecule data. While one (cis) outcome corresponded to the copy and pasting of a small segment to a distant site, a second (trans) outcome was a quasi-balanced translocation or multi-megabase inversion with substantial (10kb) duplications at each junction. We propose an HR-independent replication restart repair mechanism to explain the full spectrum of reciprocal pair outcomes. LR WGS additionally identified single-strand annealing (SSA) as a BRCA2-deficiency specific repair pathway in human cancers. Integrating these features in a classifier improved discrimination between BRCA1- vs. BRCA2-deficient genomes. In conclusion, our data reveal classes of BRCA1- and BRCA2-deficiency specific rearrangements as a source of cytogenetic aberrations in HRD cells. Citation Format: Jeremy Setton, Simon Powell, Marcin Imielinski. Scars of faulty DNA repair in cancer whole genomes [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: DNA Damage Repair: From Basic Science to Future Clinical Application; 2024 Jan 9-11; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2024;84(1 Suppl):Abstract nr IA009.
DOI: 10.1158/1538-7445.am2024-7369
2024
Abstract 7369: Passenger mutations link cellular origins and transcriptional identity in human lung adenocarcinoma
Abstract Lineage plasticity hinders identification of cellular origin of cancers, as malignant cells cease to resemble benign cells from which they presumably evolved. Lung adenocarcinoma (LUAD), the most common primary lung cancer in United States is thought to arise from alveolar type II (AT2) cells. However recent mouse models have shown that LUADs may also arise from other cell types such as basal and club. Although such mouse models are considered to be gold standard for identifying cell of origin for cancers, they are generally limited to a specific genetic context and lack all cell types that are present in human lung. Thus, to better understand the cellular origins of LUAD, we utilized a publicly available lung scRNA-seq atlas from Human Cell Atlas, spanning over 584,000 cells from 107 individuals to investigate cell type specific passenger mutational footprints of transcriptional coupled repair process (a sub-pathway of nucleotide excision repair that preferentially corrects mutations in highly expressed genes) across 295 lung cancer WGS profiles. As most mutations in self-renewing tissues are thought to occur prior to the onset of tumorigenesis, genes that are most highly transcribed in the cell of origin would have fewer somatic single nucleotide variants (SNV). To test this hypothesis, our study correlated somatic mutational density with gene expression profiles of cell types to identify Alveolar Type 0 (AT0, newly identified cell type) cells to be most strongly associated with LUAD mutation density, suggesting that the ancestor of LUAD cells in tumor spent most of their non-cancerous time in AT0 state. Interestingly studies have shown that following DNA damage in AT2 cells, transition to AT1 occurs to repair the damage. However, if the damage is detrimental, the transition of AT2 to AT1 is stalled and the cells remain in AT0 (i.e., a transitionary) state. We propose that such stalling for a long period of time can give rise to LUAD. Further our study identified a subset of LUAD patients with non-AT2 (i.e., proximal) origin. In particular, we observed significantly higher KRAS mutations in distal origin LUAD patients compared to those in proximal origin LUADs. Finally, a subset of LUAD patients with distal cell of origin adopted a more proximal transcriptional identity. Extending our analysis to lung squamous cell carcinoma (LUSC) patients, we observed basal resting cells to be the Cell of Origin (COO) for LUSC. Our study provides a novel approach for identifying cell of origin for cancers and points to a complex interplay between origin and identify in lung adenocarcinoma evolution. Citation Format: Sukanya Panja, Padmaja Mantri, Juan Andrade Martinez, Marcin Imielinski. Passenger mutations link cellular origins and transcriptional identity in human lung adenocarcinoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7369.
DOI: 10.1007/978-3-540-24743-2_8
2004
Cited 26 times
Understanding the Bacterial Stringent Response Using Reachability Analysis of Hybrid Systems
In this paper we model coupled genetic and metabolic networks as hybrid systems. The vector fields are multi – affine, i.e., have only product – type nonlinearities to accommodate chemical reactions, and are defined in rectangular invariants, whose facets correspond to changes in the behavior of a gene or enzyme. For such systems, we showed that reachability and safety verification problems can be formulated and solved (conservatively) in an elegant and computationally inexpensive way, based on the fact that multi-affine functions on rectangular regions of the space are determined at the vertices. Using these techniques, we study the stringent response system, which is the transition of bacterial organisms from growth phase to a metabolically suppressed phase when subjected to an environment with limited nutrients.
DOI: 10.1101/836296
2019
Cited 12 times
Novel patterns of complex structural variation revealed across thousands of cancer genome graphs
Summary Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g. deletion, translocation) or complex (e.g. chromothripsis, chromoplexy) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,833 tumor whole genome sequences (WGS), we introduce three complex rearrangement phenomena: pyrgo, rigma , and tyfonas . Pyrgo are “towers” of low-JCN duplications associated with early replicating regions and superenhancers, and are enriched in breast and ovarian cancers. Rigma comprise “chasms” of low-JCN deletions at late-replicating fragile sites in esophageal and other gastrointestinal (GI) adenocarcinomas. Tyfonas are “typhoons” of high-JCN junctions and fold back inversions that are enriched in acral but not cutaneous melanoma and associated with a previously uncharacterized mutational process of non-APOBEC kataegis. Clustering of tumors according to genome graph-derived features identifies subgroups associated with DNA repair defects and poor prognosis.
DOI: 10.1101/2021.08.14.456365
2021
Cited 9 times
Fanconi Anemia Pathway Deficiency Drives Copy Number Variation in Squamous Cell Carcinomas
Fanconi anemia (FA), a model syndrome of genome instability, is caused by a deficiency in DNA interstrand crosslink (ICL) repair resulting in chromosome breakage 1–3 . The FA repair pathway comprises at least 22 FANC proteins including BRCA1 and BRCA2 4–6 , and protects against carcinogenic endogenous and exogenous aldehydes 7–10 . Individuals with FA are hundreds to thousands-fold more likely to develop head and neck (HNSCC), esophageal and anogenital squamous cell carcinomas (SCCs) with a median onset age of 31 years 11 . The aggressive nature of these tumors and poor patient tolerance of platinum and radiation-based therapy have been associated with short survival in FA 11–16 . Molecular studies of SCCs from individuals with FA (FA SCCs) have been limited, and it is unclear how they relate to sporadic HNSCCs primarily driven by tobacco and alcohol exposure or human papillomavirus (HPV) infection 17 . Here, by sequencing FA SCCs, we demonstrate that the primary genomic signature of FA-deficiency is the presence of a high number of structural variants (SVs). SVs are enriched for small deletions, unbalanced translocations, and fold-back inversions that arise in the context of TP53 loss. The SV breakpoints preferentially localize to early replicating regions, common fragile sites, tandem repeats, and SINE elements. SVs are often connected forming complex rearrangements. Resultant genomic instability underlies elevated copy number alteration (CNA) rates of key HNSCC-associated genes, including PIK3CA, MYC, CSMD1, PTPRD, YAP1, MXD4, and EGFR. In contrast to sporadic HNSCC, we find no evidence of HPV infection in FA HNSCC, although positive cases were identified in gynecologic tumors. A murine allograft model of FA pathway-deficient SCC was enriched in SVs, exhibited dramatic tumor growth advantage, more rapid epithelial-to-mesenchymal transition (EMT), and enhanced autonomous inflammatory signaling when compared to an FA pathway-proficient model. In light of the protective role of the FA pathway against SV formation uncovered here, and recent findings of FA pathway insufficiency in the setting of increased formaldehyde load resulting in hematopoietic stem cell failure and carcinogenesis 18–20 , we propose that high copy-number instability in sporadic HNSCC may result from functional overload of the FA pathway by endogenous and exogenous DNA crosslinking agents. Our work lays the foundation for improved FA patient treatment and demonstrates that FA SCC is a powerful model to study tumorigenesis resulting from DNA crosslinking damage.
DOI: 10.1063/1.3456056
2010
Cited 14 times
Deep epistasis in human metabolism
We extend and apply a method that we have developed for deriving high-order epistatic relationships in large biochemical networks to a published genome-scale model of human metabolism. In our analysis we compute 33,328 reaction sets whose knockout synergistically disables one or more of 43 important metabolic functions. We also design minimal knockouts that remove flux through fumarase, an enzyme that has previously been shown to play an important role in human cancer. Most of these knockout sets employ more than eight mutually buffering reactions, spanning multiple cellular compartments and metabolic subsystems. These reaction sets suggest that human metabolic pathways possess a striking degree of parallelism, inducing "deep" epistasis between diversely annotated genes. Our results prompt specific chemical and genetic perturbation follow-up experiments that could be used to query in vivo pathway redundancy. They also suggest directions for future statistical studies of epistasis in genetic variation data sets.
DOI: 10.1038/nature12666
2013
Cited 13 times
Erratum: Corrigendum: Signatures of mutational processes in human cancer
Nature 500, 415–421 (2013); doi:10.1038/nature12477 In the author list of this Article, the surname of Marcin Imielinski was misspelled as “Imielinsk”. This has been corrected in the HTML and PDF of the original Article online.
DOI: 10.1038/ng0911-919b
2011
Cited 11 times
Erratum: Corrigendum: Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47
Nat. Genet. 43, 246–252 (2011); published online 6 February 2011; corrected after print 11 August 2011 In the version of this article initially published, an affiliation was missing for two authors, Maria Gazouli and Nicholas P. Anagnou. They are also affiliated with the Foundation for Biomedical Research of the Academy of Athens in Athens, Greece.
DOI: 10.1101/847681
2019
Cited 9 times
Robust foreground detection in somatic copy number data
Sensitive detection of somatic copy number alterations (SCNA) in cancer genomes is confounded by “waviness” in read depth data. We present dryclean , a signal processing algorithm to optimize SCNA detection in whole genome (WGS) and targeted sequencing platforms through foreground detection and background subtraction of read depth data. Application of dryclean to WGS demonstrates that WGS waviness is driven by replication timing. Re-analysis of thousands of tumor profiles reveals that dryclean provides superior detection of biologically relevant SCNAs relative to state-of-the-art algorithms. Applied to in silico tumor dilutions, dryclean improves the sensitivity of relapse detection 10-fold relative to current standards. dryclean is available as an R package in the GitHub repository https://github.com/mskilab/dryclean
DOI: 10.1038/s41588-020-0634-1
2020
Cited 8 times
Publisher Correction: Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
DOI: 10.1038/s41586-022-05597-x
2023
Author Correction: Patterns of somatic structural variation in human cancer genomes