ϟ

Halit Ongen

Here are all the papers by Halit Ongen that you can download and read on OA.mg.
Halit Ongen’s last known institution is . Download Halit Ongen PDFs here.

Claim this Profile →
DOI: 10.1038/nature12531
2013
Cited 1,827 times
Transcriptome and genome sequencing uncovers functional variation in humans
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome. Sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project reveal widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent, and the analyses point to putative causal variants for dozens of disease-associated loci. This study determines regulatory variation in the human genome with high precision via sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project. Analyses reveal widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and points to putative causal variants for dozens of disease-associated loci.
DOI: 10.1038/s41467-018-03621-1
2018
Cited 763 times
Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.
DOI: 10.1093/hmg/ddm352
2007
Cited 479 times
Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p
Genome-wide association studies have identified a region on chromosome 9p that is associated with coronary artery disease (CAD). The region is also associated with type 2 diabetes (T2D), a risk factor for CAD, although different SNPs were reported to be associated to each disease in separate studies. We have undertaken a case-control study in 4251 CAD cases and 4443 controls in four European populations using previously reported ('literature') and tagging SNPs. We replicated the literature SNPs (P = 8x10(-13); OR = 1.29; 95% CI: 1.20-1.38) and showed that the strong consistent association detected by these SNPs is a consequence of a 'yin-yang' haplotype pattern spanning 53 kb. There was no evidence of additional CAD susceptibility alleles over the major risk haplotype. CAD patients without myocardial infarction (MI) showed a trend towards stronger association than MI patients. The CAD susceptibility conferred by this locus did not differ by sex, age, smoking, obesity, hypertension or diabetes. A simultaneous test of CAD and diabetes susceptibility with CAD and T2D-associated SNPs indicated that these associations were independent of each other. Moreover, this region was not associated with differences in plasma levels of low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, fibrinogen, albumin, uric acid, bilirubin or homocysteine, although the CAD-high-risk allele was paradoxically associated with lower triglyceride levels. A large antisense non-coding RNA gene (ANRIL) collocates with the high-risk haplotype, is expressed in tissues and cell types that are affected by atherosclerosis and is a prime candidate gene for the chromosome 9p CAD locus.
DOI: 10.1038/ng.3714
2016
Cited 455 times
Integrative genomic analysis implicates limited peripheral adipose storage capacity in the pathogenesis of human insulin resistance
Luca Lotta, Robert Scott, Stephen O’Rahilly, Claudia Langenberg, David Savage, Nicholas Wareham, Inês Barroso and colleagues identify 53 genomic regions associated with insulin resistance phenotypes. Their findings suggest that limited storage capacity of peripheral adipose tissue is an important etiological component in insulin-resistant cardiometabolic disease and highlight genes and mechanisms underpinning this link. Insulin resistance is a key mediator of obesity-related cardiometabolic disease, yet the mechanisms underlying this link remain obscure. Using an integrative genomic approach, we identify 53 genomic regions associated with insulin resistance phenotypes (higher fasting insulin levels adjusted for BMI, lower HDL cholesterol levels and higher triglyceride levels) and provide evidence that their link with higher cardiometabolic risk is underpinned by an association with lower adipose mass in peripheral compartments. Using these 53 loci, we show a polygenic contribution to familial partial lipodystrophy type 1, a severe form of insulin resistance, and highlight shared molecular mechanisms in common/mild and rare/severe insulin resistance. Population-level genetic analyses combined with experiments in cellular models implicate CCDC92, DNAH10 and L3MBTL3 as previously unrecognized molecules influencing adipocyte differentiation. Our findings support the notion that limited storage capacity of peripheral adipose tissue is an important etiological component in insulin-resistant cardiometabolic disease and highlight genes and mechanisms underpinning this link.
DOI: 10.1093/bioinformatics/btv722
2015
Cited 422 times
Fast and efficient QTL mapper for thousands of molecular phenotypes
In order to discover quantitative trait loci, multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing.We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted P-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot datasets can be now easily analyzed an order of magnitude faster than previous approaches.Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/emmanouil.dermitzakis@unige.ch or olivier.delaneau@unige.chSupplementary data are available at Bioinformatics online.
DOI: 10.1038/s41588-018-0154-4
2018
Cited 394 times
Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation
We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40-80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. By integrating information from biological pathways with eQTL target genes and applying a gene-based approach, we validate previously implicated causal genes and pathways, and propose new variant and gene associations for several complex traits, which we replicate in the UK BioBank and BioVU.
DOI: 10.7554/elife.00523
2013
Cited 393 times
Passive and active DNA methylation and the interplay with genetic variation in gene regulation
DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. DOI:http://dx.doi.org/10.7554/eLife.00523.001.
DOI: 10.2337/db11-0415
2011
Cited 348 times
Genome-Wide Association Identifies Nine Common Variants Associated With Fasting Proinsulin Levels and Provides New Insights Into the Pathophysiology of Type 2 Diabetes
Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology.We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates.Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets.We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis.
DOI: 10.1038/ncomms15452
2017
Cited 255 times
A complete tool set for molecular QTL discovery and analysis
Abstract Population scale studies combining genetic information with molecular phenotypes (for example, gene expression) have become a standard to dissect the effects of genetic variants onto organismal phenotypes. These kinds of data sets require powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple new and well-established methods to prepare the data, to discover proximal and distal molQTLs and, finally, to integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few easy-to-perform steps. QTLtools is open source and available at https://qtltools.github.io/qtltools/ .
DOI: 10.1371/journal.pgen.1004958
2015
Cited 189 times
Tissue-Specific Effects of Genetic and Epigenetic Variation on Gene Regulation and Splicing
Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.
DOI: 10.1038/srep15145
2015
Cited 181 times
Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases
Abstract Aging is one of the most important biological processes and is a known risk factor for many age-related diseases in human. Studying age-related transcriptomic changes in tissues across the whole body can provide valuable information for a holistic understanding of this fundamental process. In this work, we catalogue age-related gene expression changes in nine tissues from nearly two hundred individuals collected by the Genotype-Tissue Expression (GTEx) project. In general, we find the aging gene expression signatures are very tissue specific. However, enrichment for some well-known aging components such as mitochondria biology is observed in many tissues. Different levels of cross-tissue synchronization of age-related gene expression changes are observed and some essential tissues (e.g., heart and lung) show much stronger “co-aging” than other tissues based on a principal component analysis. The aging gene signatures and complex disease genes show a complex overlapping pattern and only in some cases, we see that they are significantly overlapped in the tissues affected by the corresponding diseases. In summary, our analyses provide novel insights to the co-regulation of age-related gene expression in multiple tissues; it also presents a tissue-specific view of the link between aging and age-related diseases.
DOI: 10.1038/ng.3981
2017
Cited 176 times
Estimating the causal tissues for complex traits and diseases
How to interpret the biological causes underlying the predisposing markers identified through genome-wide association studies (GWAS) remains an open question. One direct and powerful way to assess the genetic causality behind GWAS is through analysis of expression quantitative trait loci (eQTLs). Here we describe a new approach to estimate the tissues behind the genetic causality of a variety of GWAS traits, using the cis-eQTLs in 44 tissues from the Genotype-Tissue Expression (GTEx) Consortium. We have adapted the regulatory trait concordance (RTC) score to measure the probability of eQTLs being active in multiple tissues and to calculate the probability that a GWAS-associated variant and an eQTL tag the same functional effect. By normalizing the GWAS-eQTL probabilities by the tissue-sharing estimates for eQTLs, we generate relative tissue-causality profiles for GWAS traits. Our approach not only implicates the gene likely mediating individual GWAS signals, but also highlights tissues where the genetic causality for an individual trait is likely manifested.
DOI: 10.1126/science.aat8266
2019
Cited 169 times
Chromatin three-dimensional interactions mediate genetic effects on gene expression
Noncoding variation and gene expression Natural genetic variation outside of protein coding regions affects multiple molecular phenotypes that can differ across individuals. To examine how genomic variation affects proximal (cis) or distal (trans) gene regulation, Delaneau et al. analyzed gene expression, chromatin, and the three-dimensional conformation of the genome. Clustering regulatory elements and activity across individuals reveals genomic structures termed cis-regulatory domains and trans-regulatory hubs that affect gene expression. Associations between these structures and genes within and across chromosomes contribute to links between noncoding genetic variation and gene expression. Science , this issue p. eaat8266
DOI: 10.1016/j.molonc.2016.06.003
2016
Cited 144 times
SNHG16 is regulated by the Wnt pathway in colorectal cancer and affects genes involved in lipid metabolism
It is well established that lncRNAs are aberrantly expressed in cancer where they have been shown to act as oncogenes or tumor suppressors. RNA profiling of 314 colorectal adenomas/adenocarcinomas and 292 adjacent normal colon mucosa samples using RNA-sequencing demonstrated that the snoRNA host gene 16 (SNHG16) is significantly up-regulated in adenomas and all stages of CRC. SNHG16 expression was positively correlated to the expression of Wnt-regulated transcription factors, including ASCL2, ETS2, and c-Myc. In vitro abrogation of Wnt signaling in CRC cells reduced the expression of SNHG16 indicating that SNHG16 is regulated by the Wnt pathway. Silencing of SNHG16 resulted in reduced viability, increased apoptotic cell death and impaired cell migration. The SNHG16 silencing particularly affected expression of genes involved in lipid metabolism. A connection between SNHG16 and genes involved in lipid metabolism was also observed in clinical tumors. Argonaute CrossLinking and ImmunoPrecipitation (AGO-CLIP) demonstrated that SNHG16 heavily binds AGO and has 27 AGO/miRNA target sites along its length, indicating that SNHG16 may act as a competing endogenous RNA (ceRNA) "sponging" miRNAs off their cognate targets. Most interestingly, half of the miRNA families with high confidence targets on SNHG16 also target the 3'UTR of Stearoyl-CoA Desaturase (SCD). SCD is involved in lipid metabolism and is down-regulated upon SNHG16 silencing. In conclusion, up-regulation of SNHG16 is a frequent event in CRC, likely caused by deregulated Wnt signaling. In vitro analyses demonstrate that SNHG16 may play an oncogenic role in CRC and that it affects genes involved in lipid metabolism, possible through ceRNA related mechanisms.
DOI: 10.1101/gr.150706.112
2013
Cited 167 times
Cell-type, allelic, and genetic signatures in the human pancreatic beta cell transcriptome
Elucidating the pathophysiology and molecular attributes of common disorders as well as developing targeted and effective treatments hinges on the study of the relevant cell type and tissues. Pancreatic beta cells within the islets of Langerhans are centrally involved in the pathogenesis of both type 1 and type 2 diabetes. Describing the differentiated state of the human beta cell has been hampered so far by technical (low resolution microarrays) and biological limitations (whole islet preparations rather than isolated beta cells). We circumvent these by deep RNA sequencing of purified beta cells from 11 individuals, presenting here the first characterization of the human beta cell transcriptome. We perform the first comparison of gene expression profiles between beta cells, whole islets, and beta cell depleted islet preparations, revealing thus beta-cell-specific expression and splicing signatures. Further, we demonstrate that genes with consistent increased expression in beta cells have neuronal-like properties, a signal previously hypothesized. Finally, we find evidence for extensive allelic imbalance in expression and uncover genetic regulatory variants (eQTLs) active in beta cells. This first molecular blueprint of the human beta cell offers biological insight into its differentiated function, including expression of key genes associated with both major types of diabetes.
DOI: 10.1038/nature13602
2014
Cited 141 times
Putative cis-regulatory drivers in colorectal cancer
DOI: 10.1016/j.ajhg.2010.11.007
2011
Cited 122 times
Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with Height
Height is a classic complex trait with common variants in a growing list of genes known to contribute to the phenotype. Using a genecentric genotyping array targeted toward cardiovascular-related loci, comprising 49,320 SNPs across approximately 2000 loci, we evaluated the association of common and uncommon SNPs with adult height in 114,223 individuals from 47 studies and six ethnicities. A total of 64 loci contained a SNP associated with height at array-wide significance (p < 2.4 × 10(-6)), with 42 loci surpassing the conventional genome-wide significance threshold (p < 5 × 10(-8)). Common variants with minor allele frequencies greater than 5% were observed to be associated with height in 37 previously reported loci. In individuals of European ancestry, uncommon SNPs in IL11 and SMAD3, which would not be genotyped with the use of standard genome-wide genotyping arrays, were strongly associated with height (p < 3 × 10(-11)). Conditional analysis within associated regions revealed five additional variants associated with height independent of lead SNPs within the locus, suggesting allelic heterogeneity. Although underpowered to replicate findings from individuals of European ancestry, the direction of effect of associated variants was largely consistent in African American, South Asian, and Hispanic populations. Overall, we show that dense coverage of genes for uncommon SNPs, coupled with large-scale meta-analysis, can successfully identify additional variants associated with a common complex trait.
DOI: 10.1136/annrheumdis-2018-214379
2019
Cited 84 times
Combined genetic and transcriptome analysis of patients with SLE: distinct, targetable signatures for susceptibility and severity
Systemic lupus erythematosus (SLE) diagnosis and treatment remain empirical and the molecular basis for its heterogeneity elusive. We explored the genomic basis for disease susceptibility and severity.mRNA sequencing and genotyping in blood from 142 patients with SLE and 58 healthy volunteers. Abundances of cell types were assessed by CIBERSORT and cell-specific effects by interaction terms in linear models. Differentially expressed genes (DEGs) were used to train classifiers (linear discriminant analysis) of SLE versus healthy individuals in 80% of the dataset and were validated in the remaining 20% running 1000 iterations. Transcriptome/genotypes were integrated by expression-quantitative trail loci (eQTL) analysis; tissue-specific genetic causality was assessed by regulatory trait concordance (RTC).SLE has a 'susceptibility signature' present in patients in clinical remission, an 'activity signature' linked to genes that regulate immune cell metabolism, protein synthesis and proliferation, and a 'severity signature' best illustrated in active nephritis, enriched in druggable granulocyte and plasmablast/plasma-cell pathways. Patients with SLE have also perturbed mRNA splicing enriched in immune system and interferon signalling genes. A novel transcriptome index distinguished active versus inactive disease-but not low disease activity-and correlated with disease severity. DEGs discriminate SLE versus healthy individuals with median sensitivity 86% and specificity 92% suggesting a potential use in diagnostics. Combined eQTL analysis from the Genotype Tissue Expression (GTEx) project and SLE-associated genetic polymorphisms demonstrates that susceptibility variants may regulate gene expression in the blood but also in other tissues.Specific gene networks confer susceptibility to SLE, activity and severity, and may facilitate personalised care.
DOI: 10.1016/j.celrep.2017.04.045
2017
Cited 80 times
Molecular-Subtype-Specific Biomarkers Improve Prediction of Prognosis in Colorectal Cancer
Colorectal cancer (CRC) is characterized by major inter-tumor diversity that complicates the prediction of disease and treatment outcomes. Recent efforts help resolve this by sub-classification of CRC into natural molecular subtypes; however, this strategy is not yet able to provide clinicians with improved tools for decision making. We here present an extended framework for CRC stratification that specifically aims to improve patient prognostication. Using transcriptional profiles from 1,100 CRCs, including >300 previously unpublished samples, we identify cancer cell and tumor archetypes and suggest the tumor microenvironment as a major prognostic determinant that can be influenced by the microbiome. Notably, our subtyping strategy allowed identification of archetype-specific prognostic biomarkers that provided information beyond and independent of UICC-TNM staging, MSI status, and consensus molecular subtyping. The results illustrate that our extended subtyping framework, combining subtyping and subtype-specific biomarkers, could contribute to improved patient prognostication and may form a strong basis for future studies.
DOI: 10.1002/ijc.27921
2012
Cited 70 times
Non‐CpG island promoter hypomethylation and miR‐149 regulate the expression of <i>SRPX2</i> in colorectal cancer
Abstract Gene silencing by DNA hypermethylation of CpG islands is a well‐characterized phenomenon in cancer. The effect of hypomethylation in particular of non‐CpG island genes is much less well described. By genome‐wide screening, we identified 105 genes in microsatellite stable (MSS) colorectal adenocarcinomas with an inverse correlation (Spearman's ρ ≤ −0.40) between methylation and expression. Of these, 35 (33%) were hypomethylated non‐CpG island genes and two of them, APOLD1 (Spearman's ρ = −0.82) and SRPX2 (Spearman's ρ = −0.80) were selected for further analyses. Hypomethylation of both genes were localized events not shared by adjacent genes. A set of 662 FFPE DNA samples not only confirmed that APOLD1 and SRPX2 are hypomethylated in CRC but also revealed hypomethylation to be significantly ( p &lt; 0.01) associated with tumors being localized in the left side, CpG island methylator phenotype negative, MSS, BRAF wt, undifferentiated and of adenocarcinoma histosubtype. Demethylation experiments supported SRPX2 being epigenetically regulated via DNA methylation, whereas other mechanisms in addition to DNA methylation seem to be involved in the regulation of APOLD1 . We further identified miR‐149 as a potential novel post‐transcriptional regulator of SRPX2 . In carcinoma tissue, miR‐149 was downregulated and inversely correlated to SRPX2 (ρ = −0.77). Furthermore, ectopic expression of miR‐149 significantly reduced SRPX2 transcript levels. Our study highlights that in colorectal tumors, hypomethylation of non‐CpG island‐associated promoters deregulate gene expression nearly as frequent as do CpG‐island hypermethylation. The hypomethylation of SRPX2 is focal and not part of a large block. Furthermore, it often translates to an increased expression level, which may be modulated by miR‐149.
DOI: 10.1016/j.ajhg.2015.09.004
2015
Cited 59 times
Alternative Splicing QTLs in European and African Populations
With the advent of RNA-sequencing technology, we can detect different types of alternative splicing and determine how DNA variation regulates splicing. However, given the short read lengths used in most population-based RNA-sequencing experiments, quantifying transcripts accurately remains a challenge. Here we present a method, Altrans, for discovery of alternative splicing quantitative trait loci (asQTLs). To assess the performance of Altrans, we compared it to Cufflinks and MISO in simulations and Cufflinks for asQTL discovery. Simulations show that in the presence of unannotated transcripts, Altrans performs better in quantifications than Cufflinks and MISO. We have applied Altrans and Cufflinks to the Geuvadis dataset, which comprises samples from European and African populations, and discovered (FDR = 1%) 1,427 and 166 asQTLs with Altrans and 1,737 and 304 asQTLs with Cufflinks for Europeans and Africans, respectively. We show that, by discovering a set of asQTLs in a smaller subset of European samples and replicating these in the remaining larger subset of Europeans, both methods achieve similar replication levels (95% for both methods). We find many Altrans-specific asQTLs, which replicate to a high degree (93%). This is mainly due to junctions absent from the annotations and hence not tested with Cufflinks. The asQTLs are significantly enriched for biochemically active regions of the genome, functional marks, and variants in splicing regions, highlighting their biological relevance. We present an approach for discovering asQTLs that is a more direct assessment of splicing compared to other methods and is complementary to other transcript quantification methods.
DOI: 10.1038/ncomms14418
2017
Cited 54 times
The non-coding variant rs1800734 enhances DCLK3 expression through long-range interaction and promotes colorectal cancer progression
Genome-wide association studies have identified a great number of non-coding risk variants for colorectal cancer (CRC). To date, the majority of these variants have not been functionally studied. Identification of allele-specific transcription factor (TF) binding is of great importance to understand regulatory consequences of such variants. A recently developed proteome-wide analysis of disease-associated SNPs (PWAS) enables identification of TF-DNA interactions in an unbiased manner. Here we perform a large-scale PWAS study to comprehensively characterize TF-binding landscape that is associated with CRC, which identifies 731 allele-specific TF binding at 116 CRC risk loci. This screen identifies the A-allele of rs1800734 within the promoter region of MLH1 as perturbing the binding of TFAP4 and consequently increasing DCLK3 expression through a long-range interaction, which promotes cancer malignancy through enhancing expression of the genes related to epithelial-to-mesenchymal transition.
DOI: 10.1038/s41467-023-42405-0
2024
Transposable elements mediate genetic effects altering the expression of nearby genes in colorectal cancer
Abstract Transposable elements (TEs) are prevalent repeats in the human genome, play a significant role in the regulome, and their disruption can contribute to tumorigenesis. However, TE influence on gene expression in cancer remains unclear. Here, we analyze 275 normal colon and 276 colorectal cancer samples from the SYSCOL cohort, discovering 10,231 and 5,199 TE-expression quantitative trait loci (eQTLs) in normal and tumor tissues, respectively, of which 376 are colorectal cancer specific eQTLs, likely due to methylation changes. Tumor-specific TE-eQTLs show greater enrichment of transcription factors, compared to shared TE-eQTLs suggesting specific regulation of their expression in tumor. Bayesian networks reveal 1,766 TEs as mediators of genetic effects, altering the expression of 1,558 genes, including 55 known cancer driver genes and show that tumor-specific TE-eQTLs trigger the driver capability of TEs. These insights expand our knowledge of cancer drivers, deepening our understanding of tumorigenesis and presenting potential avenues for therapeutic interventions.
DOI: 10.1101/074450
2016
Cited 38 times
Local genetic effects on gene expression across 44 human tissues
Abstract Expression quantitative trait locus (eQTL) mapping provides a powerful means to identify functional variants influencing gene expression and disease pathogenesis. We report the identification of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated protein-coding genes, with one-third having multiple independent effects. We identify numerous tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in diverse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tissues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project provides an exceptional resource that has improved our understanding of gene regulation across tissues and the role of regulatory variation in human genetic diseases.
DOI: 10.1038/s41467-018-06132-1
2018
Cited 28 times
Contribution of allelic imbalance to colorectal cancer
Point mutations in cancer have been extensively studied but chromosomal gains and losses have been more challenging to interpret due to their unspecific nature. Here we examine high-resolution allelic imbalance (AI) landscape in 1699 colorectal cancers, 256 of which have been whole-genome sequenced (WGSed). The imbalances pinpoint 38 genes as plausible AI targets based on previous knowledge. Unbiased CRISPR-Cas9 knockout and activation screens identified in total 79 genes within AI peaks regulating cell growth. Genetic and functional data implicate loss of TP53 as a sufficient driver of AI. The WGS highlights an influence of copy number aberrations on the rate of detected somatic point mutations. Importantly, the data reveal several associations between AI target genes, suggesting a role for a network of lineage-determining transcription factors in colorectal tumorigenesis. Overall, the results unravel the contribution of AI in colorectal cancer and provide a plausible explanation why so few genes are commonly affected by point mutations in cancers.
DOI: 10.1136/gutjnl-2016-313146
2017
Cited 26 times
Characterising <i>cis</i>-regulatory variation in the transcriptome of histologically normal and tumour-derived pancreatic tissues
Objective To elucidate the genetic architecture of gene expression in pancreatic tissues. Design We performed expression quantitative trait locus (eQTL) analysis in histologically normal pancreatic tissue samples (n=95) using RNA sequencing and the corresponding 1000 genomes imputed germline genotypes. Data from pancreatic tumour-derived tissue samples (n=115) from The Cancer Genome Atlas were included for comparison. Results We identified 38 615 cis -eQTLs (in 484 genes) in histologically normal tissues and 39 713 cis -eQTL (in 237 genes) in tumour-derived tissues (false discovery rate &lt;0.1), with the strongest effects seen near transcriptional start sites. Approximately 23% and 42% of genes with significant cis -eQTLs appeared to be specific for tumour-derived and normal-derived tissues, respectively. Significant enrichment of cis -eQTL variants was noted in non-coding regulatory regions, in particular for pancreatic tissues (1.53-fold to 3.12-fold, p≤0.0001), indicating tissue-specific functional relevance. A common pancreatic cancer risk locus on 9q34.2 (rs687289) was associated with ABO expression in histologically normal (p=5.8×10 −8 ) and tumour-derived (p=8.3×10 −5 ) tissues. The high linkage disequilibrium between this variant and the O blood group generating deletion variant in ABO (exon 6) suggested that nonsense-mediated decay (NMD) of the ‘O’ mRNA might explain this finding. However, knockdown of crucial NMD regulators did not influence decay of the ABO ‘O’ mRNA, indicating that a gene regulatory element influenced by pancreatic cancer risk alleles may underlie the eQTL. Conclusions We have identified cis -eQTLs representing potential functional regulatory variants in the pancreas and generated a rich data set for further studies on gene expression and its regulation in pancreatic tissues.
DOI: 10.1371/journal.pgen.1010212
2022
Cited 9 times
Regulation of HLA class I expression by non-coding gene variations
The Human Leukocyte Antigen (HLA) is a critical genetic system for different outcomes after solid organ and hematopoietic cell transplantation. Its polymorphism is usually determined by molecular technologies at the DNA level. A potential role of HLA allelic expression remains under investigation in the context of the allogenic immune response between donors and recipients. In this study, we quantified the allelic expression of all three HLA class I loci (HLA-A, B and C) by RNA sequencing and conducted an analysis of expression quantitative traits loci (eQTL) to investigate whether HLA expression regulation could be associated with non-coding gene variations. HLA-B alleles exhibited the highest expression levels followed by HLA-C and HLA-A alleles. The max fold expression variation was observed for HLA-C alleles. The expression of HLA class I loci of distinct individuals demonstrated a coordinated and paired expression of both alleles of the same locus. Expression of conserved HLA-A~B~C haplotypes differed in distinct PBMC’s suggesting an individual regulated expression of both HLA class I alleles and haplotypes. Cytokines TNFα /IFNβ, which induced a very similar upregulation of HLA class I RNA and cell surface expression across alleles did not modify the individually coordinated expression at the three HLA class I loci. By identifying cis eQTLs for the HLA class I genes, we show that the non-coding eQTLs explain 29%, 13%, and 31% of the respective HLA-A, B, C expression variance in unstimulated cells, and 9%, 23%, and 50% of the variance in cytokine-stimulated cells. The eQTLs have significantly higher effect sizes in stimulated cells compared to unstimulated cells for HLA-B and HLA-C genes expression. Our data also suggest that the identified eQTLs are independent from the coding variation which defines HLA alleles and thus may be influential on intra-allele expression variability although they might not represent the causal eQTLs.
DOI: 10.1186/s12864-023-09532-w
2023
Identifying novel regulatory effects for clinically relevant genes through the study of the Greek population
Abstract Background Expression quantitative trait loci (eQTL) studies provide insights into regulatory mechanisms underlying disease risk. Expanding studies of gene regulation to underexplored populations and to medically relevant tissues offers potential to reveal yet unknown regulatory variants and to better understand disease mechanisms. Here, we performed eQTL mapping in subcutaneous (S) and visceral (V) adipose tissue from 106 Greek individuals (Greek Metabolic study, GM) and compared our findings to those from the Genotype-Tissue Expression (GTEx) resource. Results We identified 1,930 and 1,515 eGenes in S and V respectively, over 13% of which are not observed in GTEx adipose tissue, and that do not arise due to different ancestry. We report additional context-specific regulatory effects in genes of clinical interest (e.g. oncogene ST7 ) and in genes regulating responses to environmental stimuli (e.g. MIR21, SNX33 ). We suggest that a fraction of the reported differences across populations is due to environmental effects on gene expression, driving context-specific eQTLs, and suggest that environmental effects can determine the penetrance of disease variants thus shaping disease risk. We report that over half of GM eQTLs colocalize with GWAS SNPs and of these colocalizations 41% are not detected in GTEx. We also highlight the clinical relevance of S adipose tissue by revealing that inflammatory processes are upregulated in individuals with obesity, not only in V, but also in S tissue. Conclusions By focusing on an understudied population, our results provide further candidate genes for investigation regarding their role in adipose tissue biology and their contribution to disease risk and pathogenesis.
DOI: 10.7554/elife.01045
2013
Cited 13 times
Correction: Passive and active DNA methylation and the interplay with genetic variation in gene regulation
DOI: 10.1101/074682
2016
Cited 10 times
Estimating the causal tissues for complex traits and diseases
Interpretation of biological causes of the predisposing markers identified through Genome Wide Association Studies (GWAS) remains an open question 1 . One direct and powerful way to assess the genetic causality behind GWAS is through expression quantitative trait loci (eQTLs) 2 . Here we describe a novel approach to estimate the tissues giving rise to the genetic causality behind a wide variety of GWAS traits, using the cis-eQTLs identified in 44 tissues of the GTEx consortium 3,4 . We have adapted the Regulatory Trait Concordance (RTC) score 5 , to on the one hand measure the tissue sharing probabilities of eQTLs, and also to calculate the probability that a GWAS and an eQTL variant tag the same underlying functional effect. We show that our tissue sharing estimates significantly correlate with commonly used estimates of tissue sharing. By normalizing the GWAS-eQTL probabilities with the tissue sharing estimates of the eQTLs, we can estimate the tissues from which GWAS genetic causality arises. Our approach not only indicates the gene mediating individual GWAS signals, but also can highlight tissues where the genetic causality for an individual trait is manifested.
DOI: 10.15252/emmm.201708552
2018
Cited 10 times
Comprehensive evaluation of coding region point mutations in microsatellite‐unstable colorectal cancer
Microsatellite instability (MSI) leads to accumulation of an excessive number of mutations in the genome, mostly small insertions and deletions. MSI colorectal cancers (CRCs), however, also contain more point mutations than microsatellite-stable (MSS) tumors, yet they have not been as comprehensively studied. To identify candidate driver genes affected by point mutations in MSI CRC, we ranked genes based on mutation significance while correcting for replication timing and gene expression utilizing an algorithm, MutSigCV Somatic point mutation data from the exome kit-targeted area from 24 exome-sequenced sporadic MSI CRCs and respective normals, and 12 whole-genome-sequenced sporadic MSI CRCs and respective normals were utilized. The top 73 genes were validated in 93 additional MSI CRCs. The MutSigCV ranking identified several well-established MSI CRC driver genes and provided additional evidence for previously proposed CRC candidate genes as well as shortlisted genes that have to our knowledge not been linked to CRC before. Two genes, SMARCB1 and STK38L, were also functionally scrutinized, providing evidence of a tumorigenic role, for SMARCB1 mutations in particular.
DOI: 10.1101/171694
2017
Cited 8 times
Intra- and inter-chromosomal chromatin interactions mediate genetic effects on regulatory networks
Summary Genome-wide studies on the genetic basis of gene expression and the structural properties of chromatin have considerably advanced our understanding of the function of the human genome. However, it remains unclear how structure relates to function and, in this work, we aim at bridging both by assembling a dataset that combines the activity of regulatory elements (e.g. enhancers and promoters), expression of genes and genetic variations of 317 individuals and across two cell types. We show that the regulatory activity is structured within 12,583 Cis Regulatory Domains (CRDs) that are cell type specific and highly reflective of the local (i.e. Topologically Associating Domains) and global (i.e. A/B nuclear compartments) nuclear organization of the chromatin. These CRDs essentially delimit the sets of active regulatory elements involved in the transcription of most genes, thereby capturing complex regulatory networks in which the effects of regulatory variants are propagated and combined to finally mediate expression Quantitative Trait Loci. Overall, our analysis reveals the complexity and specificity of cis and trans regulatory networks and their perturbation by genetic variation.
DOI: 10.1101/068635
2016
Cited 7 times
A complete tool set for molecular QTL discovery and analysis
Abstract Population scale studies combining genetic information with molecular phenotypes (e.g. gene expression) become a standard to dissect the effects of genetic variants onto organismal. This kind of datasets requires powerful, fast and versatile methods able to discover molecular Quantitative Trait Loci (molQTL). Here we propose such a solution, QTLtools, a modular framework that contains multiple methods to prepare the data, to discover proximal and distal molQTLs and to finally integrate them with GWAS variants and functional annotations of the genome. We demonstrate its utility by performing a complete expression QTL study in a few and easy-to-perform steps. QTLtools is open source and available at https://gtltools.github.io/gtltools/
DOI: 10.7554/elife.00523.024
2013
Cited 7 times
Author response: Passive and active DNA methylation and the interplay with genetic variation in gene regulation
Full text Figures and data Side by side Abstract eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression. https://doi.org/10.7554/eLife.00523.001 eLife digest Variations occur throughout our genome. These variations can cause genes to be expressed (switched on) in slightly different ways among individuals. Moreover, the same gene can also be expressed in different ways in different cells within an individual. A third level of variation is supplied by epigenetic markers: these are molecules that bind to the DNA at specific points and can have profound effects on the expression of nearby genes. One such epigenetic marker is the addition of a methyl group to a cytosine base, a process that is known as DNA methylation. DNA methylation usually happens when a cytosine base is next to a guanine base, forming a CpG site. In mammals, most CpG sites have methyl groups attached, although regions with a lot of CpG sites (called CpG islands) are mostly unmethylated. Initial studies suggested that methylation prevented particular genes from being expressed, but more recent work has indicated that methylation can be associated with both reduced and increased expression of genes. Moreover, it is not clear if this association is active (i.e., changes in methylation drive changes in gene expression) or passive (DNA methylation is the result of gene regulation). Now, Gutierrez-Arcelus et al. have carried out a large-scale study to clarify the relationships between three different types of gene-related variations among individuals. They extracted fibroblasts, T-cells and lymphoblastoid cells from the umbilical cords of 204 babies, and analysed them for variations in DNA sequence, gene expression and DNA methylation. Their results show that the associations between the three are more complex than was previously thought. Gutierrez-Arcelus et al. show that the mechanisms that control the association between the variations in DNA methylation and gene expression in individuals are likely to be different to those that are responsible for the establishment of methylation patterns during the process of cell differentiation. They also find that the association between DNA methylation and gene expression can be either active or passive, and can depend on the context in which they occur in our genome. Finally, where the two copies or alleles of a gene are not equally expressed in a given cell, the difference in expression is primarily regulated by DNA sequence variation, with DNA methylation having little or no role on its own. Equally complex interactions and effects are expected in further studies of genetic and epigenetic variation. https://doi.org/10.7554/eLife.00523.002 Introduction DNA methylation is an essential (Li et al., 1992) epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment is not yet fully understood (Jones, 2012; Schubeler, 2012). DNA methylation in vertebrates occurs most commonly in cytosines that are adjacent to guanines (CpG sites). Mammalian DNA methylation levels are generally high, although CpG rich regions, called CpG islands (CGI), appear mostly unmethylated (Bird, 2002; Weber et al., 2007; Lister et al., 2009). The mechanisms of de novo methylation and maintenance of methylation patterns are well known (Shoemaker et al., 2011), and alterations of these can cause severe diseases (Robertson, 2005). Even though it was originally reported to be involved in gene silencing (Holliday and Pugh, 1975; Riggs, 1975), more recent studies have found that DNA methylation can also be positively correlated to gene transcription when found in gene bodies (Hellman and Chess, 2007; Lister et al., 2009). Additionally, its participation in gene expression is proving to be highly variable, ranging from marking alternative intra-genic promoters (Maunakea et al., 2010), to being affected by transcription factors (TFs) at enhancers (Stadler et al., 2011) or it-self affecting the binding of TFs such as MYC (Prendergast and Ziff, 1991). Hence, whether DNA methylation is a consequence of gene regulation, or whether it controls gene expression changes—that is, whether it plays a passive or an active role in gene regulation—still remains a topic of debate (Schubeler, 2012). Additionally, DNA methylation can be affected by environment (Kaminsky et al., 2009) but it has also been proven that regions in the genome can autonomously determine DNA methylation states (Lienert et al., 2010). Furthermore, studies looking at natural DNA methylation variation in human populations have shown that genetic variation influences DNA methylation levels in different cell-types (Gibbs et al., 2010; Zhang et al., 2010; Bell et al., 2011), but the mechanisms by which this occurs are far from clear. In these and other studies (Kulis et al., 2012), the association between DNA methylation and gene expression in a population context, where the same gene and same methylation site can be compared across multiple individuals, has been reported to be both positive and negative. Overall, the nature of the relationships among genetic variants, DNA methylation and gene expression are still unclear despite some initial efforts (van Eijk et al., 2012). In this study we dissect the mechanistic relationships between inter-individual DNA methylation and gene expression variation using DNA sequence variability and TF abundance measured by RNA-Seq. By assaying in a high resolution and genome wide level these three layers of information in three different cell-types originating from the same set of individuals, we are able to study the role of DNA methylation variation in different dimensions. Our results reveal a picture where DNA methylation variable sites are mechanistically associated to gene expression in complex and context dependent ways that can be of passive or active nature. We further highlight some of the mechanisms by which passive DNA methylation may occur and how this role can interplay with genetic variation. Results Associations of genetic variation, DNA methylation and gene expression We use the GenCord collection (Dimas et al., 2009) of umbilical cord samples from 204 newborn babies of central European descent, from which we derived three cell-types: fibroblasts (primary cells), T-cells (primary cells) and lymphoblastoid cells (immortalized cell lines, LCLs) (Figure 1). We genotyped each individual for 2.5 million SNPs, and sequenced the poly-A transcriptome of all three cell-types from the 204 individuals yielding a median of 16 million exonic reads per sample. We subsequently removed samples from 18 to 21 genetic or expression outliers, yielding a final set of 183–186 individuals per cell type (Figure 1—figure supplements 1 and 2). The assayed SNPs were imputed to the Phase 1 release of the 1000 genomes project (Abecasis et al., 2012) yielding a set of 6.9 million SNPs. We obtained normalized expression levels for 70,800–76,870 exons belonging to 12,265–12,863 genes (Figure 1—figure supplements 3 and 4). DNA methylation levels were measured using bisulfite-conversion and hybridization to a bead chip, assaying 416,118 CpG sites in 66–118 samples. Normalized methylation levels of CpG sites range from 0 to 1, reflecting the percentage of methylation per site (β-value; Figure 1—figure supplements 5–7). In total, we analyzed 66–186 samples per cell-type and assay, belonging to 195 individuals (Table 1; see ‘Materials and methods’). Figure 1 with 7 supplements see all Download asset Open asset GenCord project scheme. We collected umbilical cord and cord blood samples from 204 newborn babies, from which we derived three cell-types: fibroblasts, lymphoblastoid cells and T-cells. Genotyping, RNA-sequencing and DNA methylation levels were assayed. The number of samples without genetic and technical outliers is indicated for each assay and each cell-type. We then correlated and utilized different properties of all datasets in order to assess: expression Quantitative Trait Loci (eQTLs), methylation QTLs (mQTLs), positive (pos) and negative (neg) expression Quantitative Trait Methylation (eQTMs). Green ticks represent Single Nucleotide Polymorphisms (SNPs), purple lollipops represent methylation sites, black boxes represent exons and orange arrows depict associations between two data-types. Shown are the maximum distances between each pair of variables tested. See Figure 1—figure supplement 1–7 for data processing and quality checks. https://doi.org/10.7554/eLife.00523.003 Table 1 Summary of main association analyses in GenCord. See Table 1–source data 1 https://doi.org/10.7554/eLife.00523.011 TestSamplesWindow sizeUnitFDR (%)Nominal p valueFibroblastsLCLsT-cellseQTLsGenotypes and expression183 (F); 185 (L); 186 (T)1 MbGenes102.2 × 10−5 (F); 3.2 × 10−5 (L); 1.8 × 10−5 (T)243333722115mQTLsGenotypes and methylation107 (F); 111 (L); 66 (T)5 kbMethylation sites104.4 × 10−4 (F); 7.9 × 10−4 (L); 1.3 × 10−3 (T)14,18922,41132,318eQTMsMethylation and expression110 (F); 118 (L); 66 (T)50 kbGenes107.6 × 10−5 (F); 7 × 10−4 (L); 6.9 × 10−4 (T)59636803838 Table 1—source data 1 Significant eQTL, mQTL and eQTM associations found in each cell-type. https://doi.org/10.7554/eLife.00523.012 Download elife-00523-data1-v1.zip At 10% FDR, we identify 2115–3372 expression quantitative trait loci (eQTLs) using a 1 Mb window to either side of the TSS. We discover 14,189–32,318 methylation QTLs (mQTLs) using a 5-kb window to either side of the CpG site. We find 1541–17,267 significant expression to methylation associations (eQTMs) using a 50-kb window around the TSS; these pertain to 596–3838 genes and 970–6846 CpG sites (Table 1, Figure 1). Orthogonal roles of developmental and inter-individual DNA methylation variation DNA methylation at promoter regions is widely known to correlate negatively with gene expression levels when looking at comparisons across genes (Jones, 2012). We observe the same pattern in our study looking at the promoter regions of all genes of each individual separately and for each cell-type (Figure 2—figure supplement 1). The across individual methylation-gene expression associations (eQTMs) however appear to be either positive or negative, even for DNA methylation sites in promoter regions. Hence, we hypothesized that methylation sites in promoter regions from positive (pos) and negative (neg) eQTMs contribute differently (positively and negatively, respectively) to gene expression levels across genes. Contrary to our hypothesis, we find that methylation sites correlate negatively with gene expression across genes independently of whether they correlate positively or negatively with gene expression across individuals (Figure 2A; Spearman correlation coefficient, rho = −0.11, p=1.1 × 10−4, and rho = −0.10, p=1.7 × 10−13, respectively; see also Figure 2—figure supplement 2). The strength of these negative correlations, despite involving a subset of genes, is comparable to the one found at a genome-wide level, correlating all expressed genes with their promoter DNA methylation status per individual (Figure 2—figure supplement 1C). These results suggest that the mechanisms and processes underlying inter-individual DNA methylation variation associated to gene expression are at least partly independent of the mechanisms involved in the establishment of the repressive mark of promoter DNA methylation across genes during development and differentiation. Figure 2 with 3 supplements see all Download asset Open asset Inter-individual DNA methylation variation in cell-type differentiation and in different genomic contexts. (A) The median methylation level of promoter eQTM sites (x-axis) correlates negatively with across gene median number of reads per kilobase per million reads (RPKM) irrespective of whether they are pos-eQTMs (yellow, N = 1149) or neg-eQTMs (blue; N = 5112). Spearman correlation coefficient rho is indicated in the plot with p=1.1 × 10−4 and p=1.7 × 10−13 for pos and neg-eQTMs, respectively. See Figure 2—figure supplements 1 and 2. (B) As the level of cell-type methylation differentiation increases (x-axis), a larger proportion of sites are associated to gene expression (eQTMs, left y-axis) and affected by genetic variation (mQTLs, right y-axis). Proportions are plotted by 10 bins each containing 10% of the data (0.1 quantiles). Level of methylation differentiation is measured for each site as the coefficient of variation of the median methylation level per cell-type. See Figure 2—figure supplement 3. (C) Proportion of eQTMs that are positive (pos-eQTMs, yellow) or negative (neg-eQTMs, blue) overlapping vs non-overlapping (expected) distinct genomic features (promoters, CTCF binding sites, enhancers), or overlapping CpG island promoters (CGI prom) vs overlapping non-CpG island promoters (non-CGI prom). For T-cells there are no CTCF or chromatin ChIP-seq data available so the data of an LCL were used instead (see Materials and methods). (D) For each non-CGI and CGI promoters (x-axis), the proportion (y-axis) of overlapping mQTLs was calculated (red bars) and was compared to the proportion of overlapping null SNPs (black bars). One star indicates p<0.05, two stars indicate p<1 × 10−6, Fisher’s exact test. https://doi.org/10.7554/eLife.00523.013 Differentially methylated regions among cell-types have been shown to play important roles in cell differentiation and tissue-specific regulation (Meissner et al., 2008; Schmidl et al., 2009; Hodges et al., 2011; Li et al., 2012). Hence, we asked whether differentially methylated sites across the three cell-types are more likely to be functionally relevant when variable within a population, hence more often associated to gene expression (eQTMs) or genetic variation (mQTLs) than non-differentiated sites across cell-types. We measured the level of differentiation for each site by calculating its median methylation level in each cell-type and then calculating the coefficient of variation between those three medians, which is a measure of variability that controls for the mean level of methylation (Figure 2—figure supplement 3A). We find that as differentiation per methylation site increases, a higher proportion of eQTMs and mQTLs are observed (Figure 2B). This is statistically supported by the fact that the tissue differentiation level of methylation sites involved in eQTMs and mQTLs is significantly higher than for non-eQTM and non-mQTL sites, respectively (all p values<2.2 × 10−16, Wilcoxon test, Figure 2—figure supplement 3B,C). These results show that the same methylation sites marking or contributing to tissue differentiation are participating in inter-individual variability that is highly determined by genetic variation and associated to gene expression. This could indicate that the establishment of differentially methylated regions during development could be delineating a backbone where local inter-individual changes may occur. Context-specific DNA methylation In order to further understand the nature of these inter-individual changes we analyzed their participation in different genomic contexts. We find a significant increased presence of negative compared to positive eQTMs in CTCF binding sites, enhancers and promoters, with this increase being higher in non-CpG island (CGI) promoters compared to CGI promoters (Figure 2C). Interestingly, we discover a significant depletion of mQTLs in CGI promoters, and a significant enrichment of mQTLs in non-CGI promoters (Figure 2D). This shows that methylation sites in non-CGI promoters are under a stronger genetic control than at CGI promoters, where methylation is in general robustly maintained at low levels. Nevertheless, genetic variation in CpG islands commonly affects gene expression, since we find that eQTLs are significantly enriched at these genomic regions (all p values<0.03). Overall, these results suggest that the role of DNA methylation can be highly dependent on the genomic and functional context. DNA methylation and allele specific expression Allele specific expression (ASE), seen as a signal of regulatory difference between two haplotypes of an individual, can in theory be driven either by genetic regulatory variation or epigenetic (in)activation of one of the two alleles, for example, by DNA methylation. In order to test whether ASE is caused by genetic variation or differential DNA methylation, we compared the magnitude of allelic imbalance in eQTL genes between individuals who are heterozygous for the eQTL and their respective homozygotes. Similar to other studies (Dimas et al., 2009; Pickrell et al., 2010; Montgomery et al., 2010), at a genome-wide level a significantly greater allelic imbalance is associated with heterozygote eQTLs in all three cell-types (p values<3.0 × 10−8, Figure 3A). In order to test whether ASE is driven by haplotype differences in DNA methylation alone, potentially represented by semimethylated sites (partially methylated, β-value >0.3 and <0.7), we analyzed the genes having eQTM methylation sites that are not associated with SNPs (filtered out SNP-methylation correlations with nominal p<0.01). A comparison of allelic imbalance between individuals with semimethylated eQTMs, and homomethylated (i.e., fully methylated, β-value > 0.7, or unmethylated, β-value < 0.3) eQTMs, revealed no significant differences (p values>0.42, Figure 3A). Additionally, ASE driven by heterozygote eQTLs is significantly higher than that of semimethylated eQTMs (p values<7.5 × 10−3, Figure 3A). Furthermore, as a positive control, we observe that allelic imbalance in imprinted genes, known to be driven by allelic DNA methylation, is significantly higher than in homomethylated eQTMs (p values<9.7 × 10−4, Figure 3A). Thus, while ASE is shown to be significantly driven by genetic variation (or by DNA methylation in imprinted genes), we find no evidence in our data of methylation alone contributing to ASE. This argues that DNA methylation is rarely allele-specific in the absence of DNA sequence variation effects, and while we cannot exclude other possible epigenetic sources of ASE, the results suggest that the widespread ASE across the genome may be primarily driven by common and rare genetic regulatory variants. Figure 3 Download asset Open asset DNA methylation associated to gene expression is not significantly allelic and can interact with genetic variation. (A) In green are depicted the distributions of allelic imbalance (i.e., absolute distance from the expected 0.5 ratio) of assayable heterozygote sites in eQTL genes of individuals that are homozygous (HOM) or heterozygous (HET) for the eQTL. The difference between distributions is significant in all cell-types with p<2.2 × 10−16, p=2.7 × 10−15 and p=3.0 × 10−8 in fibroblasts (F), LCLs (L) and T-cells (T), respectively (Wilcoxon test). This strongly indicates that allele specific expression is significantly driven by genetic variation. In purple are shown the distributions of allelic imbalance of assayable heterozygote sites in eQTM genes (excluding methylation sites affected by genetic variation) of individuals that are homomethylated (HOMOMETH, i.e., fully methylated, β-value > 0.7, or unmethylated β-value < 0.3) or semimethylated (SEMIMETH, i.e., β-value >0.3 and <0.7) for the eQTM site. The difference between distributions is not significant in any cell-type, with p=0.79, p=0.42 and p=0.49 in F, L, T, respectively. The difference between distributions of HET eQTLs and SEMIMETH eQTMs is significant in all cell-types with p=7.5 × 10−3, p=3.4 × 10−7, p=1.1 × 10−13, in F, L, T, respectively. The difference between distributions of IMPRINTED genes and HOMOMETH eQTMs is significant in all cell-types with p=6.9 × 10−34, p=9.7 × 10−4, p=8.5 × 10−5 in F, L, and T, respectively. This shows that allele specific expression is not significantly driven by DNA methylation that is not affected by genetic variation. (B) Using linear regression we tested the interaction term as shown in the illustrated formula for the effects of SNPs (green tick) and methylation sites (purple lollipop) on gene expression (black squared arrow and boxes). Qqplots illustrate the enrichment of low synergistic interaction observed p values (black), together with the 5th and 95th confidence limits based on expression permutations (gray) with respect to the expected uniform distribution. https://doi.org/10.7554/eLife.00523.017 Synergistic interactions between DNA methylation and genetic variation on gene expression We then sought to explore the mechanistic relationships among DNA methylation, gene expression and genetic variation. We first tested whether there was a significant enrichment of synergistic interactions between genetic variants and DNA methylation on gene expression using linear regression. We selected for each exon all eQTLs with p<1 × 10−4 that fell in independent recombination intervals and all eQTMs with p<0.001. To avoid artificial inflation of significant interactions, we filtered out any exon-SNP-meth triplets where the SNP and methylation site correlated with p<0.05. To avoid spurious interactions caused by outliers, we further filtered out cases where there were less than four individuals homozygous for the minor allele of a SNP. Finally, to further account for remaining correlation between the SNP and the methylation we permuted the expression values 1000 times to infer the 95% confidence intervals and assess the significance of the enrichment of low p values. Synergistic interactions are enriched in LCLs and T-cells (Figure 3B) with π1 being 4.3% and 9.3% (all p<0.001), which reflects the percent of estimated true positives from the p value distributions (Storey and Tibshirani, 2003). In fibroblasts, although the p value distribution is not above the 95th confidence limit (Figure 3B), the observed π1 is 12.4% (p<0.001), 7.2 times larger than the top permuted π1, which suggests a significant enrichment of interactions. Finally, we find 3, 91 and 14 individually significant interactions in fibroblasts, LCLs and T-cells, respectively, at 10% false discovery rate (FDR). Overall, these results reflect the interdependency of genetic and epigenetic variation to determine gene expression levels. Passive and active roles of DNA methylation in gene regulation We further dissected the causative relationships between DNA methylation and gene expression by considering that the SNP triggers the causal network since its state is not modifiable in time. We used Bayesian Network (BN) construction and relative likelihood (see ‘Materials and methods’) to test which of the three possible causative models depicted in Figure 4A and described below is the most likely given the data for each set of variables. Under the ‘INDEP’ (independent) model, a SNP affects independently gene expression and DNA methylation (passive role of methylation). Under the ‘SME’ (SNP-methylation-expression) model, the SNP affects methylation, which then affects gene expression (active role). The ‘SEM’ (SNP-expression-methylation) model requires that a SNP affects gene expression, and expression then affects methylation (passive role). We tested the relative likelihood of these models choosing a non-biased approach where at least two of the three pairwise correlations between genetic variation, gene expression and DNA methylation are significant, yielding 831–2928 SNP-methyl-exon triplets tested per cell-type. All three models occur, depending on cell-type and sign of correlation between DNA methylation and gene expression (Figure 4B, Figure 4—figure supplement 1A). The INDEP model, that reflects a passive role for DNA methylation, is the most likely model in fibroblasts and LCLs. However, in T-cells the SME model, where methylation takes an active causative role, is the major contributor. Note that in the SEM model, where methylation is passive being influenced by gene expression levels, a higher likelihood of pos-eQTMs is found compared to the SME model. Overall, these results suggest that DNA methylation can be both active, by being a likely cause of gene expression variation levels, or passive, by being a consequence or an independent mark of gene expression levels. Figure 4 with 3 supplements see all Download asset Open asset Passive and active roles of DNA methylation in gene regulation. (A) Illustration of the three possible causative models tested of mechanistic relationships between genetic variation (SNP), DNA methylation (methyl) and gene expression (expr). Arrows indicate the causal direction of effects. The name of each model is underlined. (B) Mosaicplots illustrate the relative likelihoods of each model (x-axis), partitioned by the relative likelihoods of those involving pos-eQTMs (yellow) and neg-eQTMs (blue; y-axis), in fibroblasts (F), LCLs (L) and T-cells (T). The three types of models (INDEP, SME and SEM) are present in the three cell-types, suggesting that DNA methylation can have both active and passive roles in gene regulation. See Figure 4 –figure supplement 1 and Figure 4–Source data 1. (C) Heatmap of p value relative frequency distributions of spearman correlations between transcription factors (TF) and DNA methylation levels of eQTMs at their binding sites, sorted by π1. The enrichment of significant associations can be appreciated by the accumulation of reddish colors, reflecting higher relative frequencies, at low p values, and yellowish colors, reflecting lower relative frequencies, at higher p values. These results highlight one of the possible mechanisms of a passive role of DNA methylation regarding gene expression. See Figure 4—figure supplements 2 and 3. https://doi.org/10.7554/eLife.00523.018 Figure 4—source data 1 High confidence calls for INDEP, SME and SEM models in each cell-type. https://doi.org/10.7554/eLife.00523.019 Download elife-00523-fig4-data1-v1.zip In order to obtain a set of high confidence calls for each of the models, we have required that the BN calls are confirmed by an independent method called the Causal Inference Test (Millstein et al., 2009) (see details in ‘Materials and methods’). From the total number of tests, 61%, 36% and 27% were called as high confidence (HC) in fibroblasts, LCLs and T-cells, respectively (Figure 4—source data 1). The relative frequencies of these HC calls look similar to the general relative likelihood space of the models (Figure 4—figure supplement 1B). As an example of a HC INDEP model, we identified in fibroblasts a case occurring at the promoter of gene DPYSL4 and involving a methylation site associated to age and age rate in blood samples (Hannum et al., 2013). In this example, SNP rs12772795 affects independently the DNA methylation status of site cg05652533 and the expression level of gene DPYSL4, possibly via an effect on the binding of CTCF nearby (given binding peak reported), which is a factor known to alter DNA methylation levels locally (Stadler et al., 2011). As an example of an SME model, we have identified in T-cells SNP rs1362125 that could be affecting the binding of SP1 (overlapping peak reported), which is a factor shown to confer methylation protection (Boumber et al., 2008). This could then alter the methylation state of site cg24703717 (78bp away) that falls in a YY1 binding peak, a factor whose binding is known to be sensitive to DNA methylation levels (Kim et al., 2003). Hence methylation could actively alter binding of YY1, negatively affecting the expression of the gene HLA-F. Finally, we would like to highlight in LCLs an SEM scenario in which SNP rs3733346, located in a DNAse hypersensitive site, affects the expression of gene DGKQ, whose transcriptional activity could then be influencing positively DNA methylation levels of site cg00846425 located in its gene body, as has been suggested to be a possible phenomenon for gene body DNA methylation (Hahn et al., 2011). DNA methylation associated to transcription factor abundance To understand potential molecular causes for the passive role of methylation in the INDEP model we postulated that SNPs could be influencing binding levels of DNA-binding factors (hereon called transcription factors or TFs), which
DOI: 10.1038/srep19384
2016
Cited 6 times
Correction: Corrigendum: Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases
Aging is one of the most important biological processes and is a known risk factor for many age-related diseases in human. Studying age-related transcriptomic changes in tissues across the whole body can provide valuable information for a holistic understanding of this fundamental process. In this work, we catalogue age-related gene expression changes in nine tissues from nearly two hundred individuals collected by the Genotype-Tissue Expression (GTEx) project. In general, we find the aging gene expression signatures are very tissue specific. However, enrichment for some well-known aging components such as mitochondria biology is observed in many tissues. Different levels of cross-tissue synchronization of age-related gene expression changes are observed, and some essential tissues (e.g., heart and lung) show much stronger “co-aging” than other tissues based on a principal component analysis. The aging gene signatures and complex disease genes show a complex overlapping pattern and only in some cases, we see that they are significantly overlapped in the tissues affected by the corresponding diseases. In summary, our analyses provide novel insights to the co-regulation of age-related gene expression in multiple tissues; it also presents a tissue-specific view of the link between aging and age-related diseases.
DOI: 10.1038/ng0217-317c
2017
Cited 6 times
Erratum: Corrigendum: Integrative genomic analysis implicates limited peripheral adipose storage capacity in the pathogenesis of human insulin resistance
Nat. Genet.; 10.1038/ng.3714; corrected online 5 December 2016 In the version of this article initially published online, the middle initial of collaborator Maarten R. Soeters was inadvertently omitted. The error has been corrected for the print, PDF and HTML versions of this article.
DOI: 10.1038/s41380-022-01768-4
2022
Cited 3 times
Leveraging interindividual variability of regulatory activity for refining genetic regulation of gene expression in schizophrenia
Schizophrenia is a polygenic psychiatric disorder with limited understanding about the mechanistic changes in gene expression regulation. To elucidate on this, we integrate interindividual variability of regulatory activity (ChIP-sequencing for H3K27ac histone mark) with gene expression and genotype data captured from the prefrontal cortex of 272 cases and controls. By measuring interindividual correlation among proximal chromatin peaks, we show that regulatory element activity is structured into 10,936 and 10,376 cis-regulatory domains in cases and controls, respectively. The schizophrenia-specific cis-regulatory domains are enriched for fetal-specific (p = 0.0014, OR = 1.52) and depleted of adult-specific regulatory activity (p = 3.04 × 10-50, OR = 0.57) and are enriched for SCZ heritability (p = 0.001). By studying the interplay among genetic variants, gene expression, and cis-regulatory domains, we ascertain that changes in coordinated regulatory activity tag alterations in gene expression levels (p = 3.43 × 10-5, OR = 1.65), unveil case-specific QTL effects, and identify regulatory machinery changes for genes affecting synaptic function and dendritic spine morphology in schizophrenia. Altogether, we show that accounting for coordinated regulatory activity provides a novel mechanistic approach to reduce the search space for unveiling genetically perturbed regulation of gene expression in schizophrenia.
DOI: 10.1038/s41467-020-16000-6
2020
Cited 5 times
MethCORR modelling of methylomes from formalin-fixed paraffin-embedded tissue enables characterization and prognostication of colorectal cancer
Transcriptional characterization and classification has potential to resolve the inter-tumor heterogeneity of colorectal cancer and improve patient management. Yet, robust transcriptional profiling is difficult using formalin-fixed, paraffin-embedded (FFPE) samples, which complicates testing in clinical and archival material. We present MethCORR, an approach that allows uniform molecular characterization and classification of fresh-frozen and FFPE samples. MethCORR identifies genome-wide correlations between RNA expression and DNA methylation in fresh-frozen samples. This information is used to infer gene expression information in FFPE samples from their methylation profiles. MethCORR is here applied to methylation profiles from 877 fresh-frozen/FFPE samples and comparative analysis identifies the same two subtypes in four independent cohorts. Furthermore, subtype-specific prognostic biomarkers that better predicts relapse-free survival (HR = 2.66, 95%CI [1.67-4.22], P value < 0.001 (log-rank test)) than UICC tumor, node, metastasis (TNM) staging and microsatellite instability status are identified and validated using DNA methylation-specific PCR. The MethCORR approach is general, and may be similarly successful for other cancer types.
DOI: 10.1101/014126
2015
Cited 4 times
Alternative splicing QTLs in European and African populations using Altrans, a novel method for splice junction quantification
With the advent of RNA-sequencing technology we now have the power to detect different types of alternative splicing and how DNA variation affects splicing. However, given the short read lengths used in most population based RNA-sequencing experiments, quantifying transcripts accurately remains a challenge. Here we present a novel method, Altrans, for discovery of alternative splicing quantitative trait loci (asQTLs). To assess the performance of Altrans we compared it to Cufflinks, a well-established transcript quantification method. Simulations show that in the presence of transcripts absent from the annotation, Altrans performs better in quantifications than Cufflinks. We have applied Altrans and Cufflinks to the Geuvadis dataset, which comprises samples from European and African populations, and discovered (FDR = 1%) 1806 and 243 asQTLs with Altrans, and 1596 and 288 asQTLs with Cufflinks for Europeans and Africans, respectively. Although Cufflinks results replicated better across the two populations, this likely due to the increased sensitivity of Altrans in detecting harder to detect associations. We show that, by discovering a set of asQTLs in a smaller subset of European samples and replicating these in the remaining larger subset of Europeans, both methods achieve similar replication levels (94% and 98% replication in Altrans and Cufflinks, respectively). We find that method specific asQTLs are largely due to different types of alternative splicing events detected by each method. We overlapped the asQTLs with biochemically active regions of the genome and observed significant enrichments for many functional marks and variants in splicing regions, highlighting the biological relevance of the asQTLs identified. All together, we present a novel approach for discovering asQTLs that is a more direct assessment of splicing compared to other methods and is complementary to other transcript quantification methods.
DOI: 10.5152/eurjbiol.2017.1711
2017
Cited 4 times
Alterations at the Synthesis and Degradation of E-cadherin in the Human Lungs with Emphysema
Pulmonary emphysema leads to a cascade of events starting with enlarged alveoli, loss of alveoli and, subsequently to the damage and disruption of pulmonary epithelium.The integrity of the pulmonary epithelium, which is constituted by pneumocytes linked to each other through E-cadherin proteins, is important for respiration.The aim of the present study was to detect the content and destruction of E-cadherin protein and to investigate the contribution of E-cadherin to pulmonary emphysema pathogenesis.The structural changes, reparative capacity of the pulmonary epithelium, amount of E-cadherin protein and, the immunoreactivity of neural precursor cell expressed developmentally down-regulated protein 9 (NEDD9) were evaluated in emphysematous (n=7) and non-emphysematous (n=6) areas of lung samples taken from patients with chronic obstructive pulmonary disease.Emphysematous areas are characterized by enlarged alveoli, disrupted alveolar walls and epithelium, increased type 2 pneumocytes and NEDD9 immunoreactivity, and reduced E-cadherin proteins.Our data shows that E-cadherin levels are decreased in emphysematous areas due to its degradation by NEDD9.Decreased E-cadherin levels also lead to the disintegration of the pulmonary epithelium by causing the presence of weakness intercellular connections or the absence of intercellular connections.The repair of the pulmonary epithelium could not complete due to the reduced E-cadherin, because type 2 pneumocytes could not differentiate into type 1 pneumocytes.In conclusion, the reduced E-cadherin levels lead to emphysematous alterations in human lungs and contributes to pulmonary emphysema pathogenesis.
DOI: 10.1101/022301
2015
Cited 3 times
Fast and efficient QTL mapper for thousands of molecular phenotypes
Motivation: In order to discover quantitative trait loci (QTLs), multi-dimensional genomic data sets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. Results: We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted p-values can be estimated at any level of significance with little computational cost. The Geuvadis &amp; GTEx pilot data sets can be now easily analyzed an order of magnitude faster than previous approaches. Availability: Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/
DOI: 10.21203/rs.3.rs-2805343/v1
2023
Identifying novel regulatory effects for clinically relevant genes through the study of the Greek population
Abstract Background Expression quantitative trait loci (eQTL) studies provide insights into regulatory mechanisms underlying disease risk. Expanding studies of gene regulation to underexplored populations and to medically relevant tissues offers potential to reveal yet unknown regulatory variants and to better understand disease mechanisms. Here, we performed eQTL mapping in subcutaneous (S) and visceral (V) adipose tissue from 106 Greek individuals (Greek Metabolic study, GM) and compared our findings to those from the Genotype-Tissue Expression (GTEx) resource. Results We identified 1,930 and 1,515 eGenes in S and V respectively, over 13% of which are not observed in GTEx adipose tissue, and that do not arise due to different ancestry. We report additional context-specific regulatory effects in genes of clinical interest (e.g. oncogene ST7 ) and in genes regulating responses to environmental stimuli (e.g. MIR21, SNX33 ). We suggest that a fraction of the reported differences across populations is due to environmental effects on gene expression, driving context-specific eQTLs, and suggest that environmental effects can determine the penetrance of disease variants thus shaping disease risk. We report that over half of GM eQTLs colocalize with GWAS SNPs and of these colocalizations 41% are not detected in GTEx. We also highlight the clinical relevance of S adipose tissue by revealing that inflammatory processes are upregulated in obese individuals, not only in V, but also in S tissue. Conclusions By focusing on an understudied population, our results provide further candidate genes for investigation regarding their role in adipose tissue biology and their contribution to disease risk and pathogenesis.
DOI: 10.6084/m9.figshare.23977029
2023
Files for mito-nuclear eQTLs analysis
Files that were used for mito-nuclear eQTL analyses.
DOI: 10.6084/m9.figshare.23896213
2023
Additional file 6 of Identifying novel regulatory effects for clinically relevant genes through the study of the Greek population
Additional file 6. Main scripts for eQTL mapping using fastQTL.
DOI: 10.6084/m9.figshare.23896204
2023
Additional file 3 of Identifying novel regulatory effects for clinically relevant genes through the study of the Greek population
Additional file 3: Supplementary Table S17. DEGs by tissue in GM, GTEx-am and GTEx-sm. Supplementary Table S18. S vs V DEGs with discordant direction of gene expression bewteen GM and GTEx-am. Supplementary Table S19. DEGs by obesity status in GM tissues. Supplementary Table S20. DEGs by population in S and V tissue. Supplementary Table S21. ATAC-Seq peaks in GM.
DOI: 10.6084/m9.figshare.23896201
2023
Additional file 2 of Identifying novel regulatory effects for clinically relevant genes through the study of the Greek population
Additional file 2: Supplementary Table S1. eQTLs in GM adipose tissues. Supplementary Table S3. GenotypexObesity significant (p&lt;0.05) interactions in GM. Supplementary Table S4. GenotypexSex significant (p&lt;0.05) interactions in GM. Supplmentary Table S5. eQTLs in GM tissues, when including obesity status as a covariate. Supplementary Table S6. Missense significant (FDR&lt;5%) ASE SNPs identified in each tissue of GM. Supplementary Table S7. Replication of eQTLs across GM tissues. Supplementary Table S8. Detection of GM eQTLs using a linear mixed model with a Genotype×Tissue interaction term (FDR&lt;0.05). Supplementary Table S9. Thirty-five eGenes identified in one GM tissue only by both p-value enrichment analysis and linear mixed model with a Genotype×Tissue interaction term. Supplementary Table S10. Replication of eQTLs across populations for each tissue. Supplementary Table S11. Thirty-two eGenes identified in both GM tissues, but not in GTEx-am. Supplementary Table S12. GWAS-eQTL colocalizations (RTC≥0.9) in GM for each tissue. Supplementary Table S14. Sharing of eGene-trait pairs in GM and GTEx-am. Supplementary Table S15. Expression levels of eGenes corresponding to GM specific eGene-trait pairs in GM and GTEx-am. Supplementary Table S16. Secondary, and not primary, eQTL colocalizations with GWAS signals in GM.
DOI: 10.1038/s41467-020-16538-5
2020
Publisher Correction: MethCORR modelling of methylomes from formalin-fixed paraffin-embedded tissue enables characterization and prognostication of colorectal cancer
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
DOI: 10.1158/1538-7445.am2016-2630
2016
Abstract 2630: Integration of tumor microenvironment and molecular subclassification of colorectal cancer identifies patient subsets with poor prognosis
Abstract The patho-molecular diversity of colorectal cancer (CRC) complicates the prediction of patient postoperative prognosis and response to treatment. This calls for methods for CRC stratification beyond clinical features and the few molecular biomarkers currently implemented. Recent studies have highlighted the importance of the tumor microenvironment in CRC. Here we show by integrative transcriptome/methylome analysis of &amp;gt;300 patient samples that CRC generates five arche-typic microenviroments encompassing three molecularly distinct types of cancer cells. The existence of the microenviroments, which we denote “Secretory”, “Stroma”, “Immune”, “ARE”, and “CIN”, could be independently identified in large external CRC datasets and are in concordance with the CRC subtypes recently defined by the CRC Subtyping Consortium (1). Detailed analysis of the five microenvironments reveals striking differences in cell composition and pathway activity. We observe that the microenvironment, particularly immune and stromal cell activity, is closely associated with patient prognosis. In particular, tumors associated with low anti-tumoral immune activity due to poor recruitment of immune cells or immune inhibition by activated stromal cells have poor prognosis. Importantly, we find that microenviroment subtyping is in fact pivotal for molecular prediction of patient prognosis as no single molecular marker carries robust prognostic information in all CRC subtypes and environments. Instead, potent prognostic biomarkers are environment-specific: As an example interferon signaling is deregulated in several microenviroment types, yet the specific interferon response genes deregulated (i.e. biomarker candidates) differ between microenvironments in accordance with their cellular composition. Our approach has allowed us to identify superior, single prognostic biomarkers for each CRC type and, importantly, their prognostic value in external datasets is only observed after environment subtyping. Collectively, these observations may well explain the poor validation rate of prognostic biomarker candidates in the past. In conclusion: we provide a CRC microenvironment framework that has the potential to guide future clinical decision-making primarily in relation to prognosis, but likely also in relation to therapy choice. This work is performed within framework the SYSCOL consortium funded by EU FP7 (1) Guinney, J. et al. “The consensus molecular subtypes of colorectal cancer”, Nature Medicine (2015) doi:10.1038/nm.3967 Citation Format: Jesper B. Bramsen, Mads H. Rasmussen, Halit Ongen, Søren Vang, Philippe Lamy, Manel B. Esteller, Emmanouil T. Dermitzakis, Torben F. Orntoft, Claus L. Andersen. Integration of tumor microenvironment and molecular subclassification of colorectal cancer identifies patient subsets with poor prognosis. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 2630.
2011
A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease
Genome-wide association studies have identified 11 common variants convincingly associated with coronary artery disease (CAD)(1-7), a modest number considering the apparent heritability of CAD(8). ...
DOI: 10.1016/j.ajhg.2012.05.017
2012
Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with Height
(The American Journal of Human Genetics 88, pages 6–18; December 30, 2010) Maciej Tomaszewski's name was misspelled in the original author list and has been corrected here. The authors regret the error. Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with HeightLanktree et al.The American Journal of Human GeneticsDecember 30, 2010In BriefHeight is a classic complex trait with common variants in a growing list of genes known to contribute to the phenotype. Using a genecentric genotyping array targeted toward cardiovascular-related loci, comprising 49,320 SNPs across approximately 2000 loci, we evaluated the association of common and uncommon SNPs with adult height in 114,223 individuals from 47 studies and six ethnicities. A total of 64 loci contained a SNP associated with height at array-wide significance (p < 2.4 × 10−6), with 42 loci surpassing the conventional genome-wide significance threshold (p < 5 × 10−8). Full-Text PDF Open Archive
DOI: 10.17615/b0dx-tm14
2011
Meta-analysis of Dense Genecentric Association Studies Reveals Common and Uncommon Variants Associated with Height
DOI: 10.1101/174219
2017
Hundreds of Putative Non-Coding Cis-Regulatory Drivers in Chronic Lymphocytic Leukaemia and Skin Cancer
ABSTRACT Perturbations of the coding genome and their role in cancer development have been studied extensively. However, the non-coding genome’s contribution in cancer is poorly understood ( 1 ), not only because it is difficult to define the non-coding regulatory regions and the genes they regulate, but also because there is limited power owing to the regulatory regions’ small size. In this study, we try to resolve this issue by defining modules of coordinated non-coding regulatory regions of genes (Cis Regulatory Domains or CRDs). To do so, we use the correlation between histone modifications, assayed by ChIP-seq, in population samples of immortalized B-cells and skin fibroblasts. We screen for CRDs that accumulate an excess of somatic mutations in chronic lymphocytic leukaemia (CLL) and skin cancer, which affect these cell types, after accounting for somatic mutational patterns and biases. At 5% FDR, we find 90 CRDs with significant excess somatic of mutations in CLL, 60 of which regulate 126 genes, and in skin cancer 59 significant CRDs, 25 of which regulate 37 genes. The genes these CRDs regulate include ones already implicated in tumorigenesis, and are enriched in pathways already implicated in the respective cancers, like the B-cell receptor signalling pathway in CLL and the TGFβ signalling pathway in skin cancer. We discover that the somatic mutations in the significant CRDs of CLL are hitting bases more likely to be functional than the mutations in non-significant CRDs. Moreover, in both cancers, mutational signatures observed in the regulatory regions of significant CRDs deviate significantly from their null sequences. Both results indicate selection acting on CRDs during tumorigenesis. Finally, we find that the transcription factor biding sites that are disturbed by the somatic mutations in significant CRDs are enriched for factors known to be involved in cancer development. We are describing a new powerful approach to discover non-coding regions involved in tumorigenesis in CLL and skin cancer and this approach could be generalized to other cancers.
DOI: 10.17615/p82n-gb73
2017
Genetic effects on gene expression across human tissues
DOI: 10.1101/255109
2018
Genomic dissection of Systemic Lupus Erythematosus: Distinct Susceptibility, Activity and Severity Signatures
Abstract Recent genetic and genomics approaches have yielded novel insights in the pathogenesis of Systemic Lupus Erythematosus (SLE) but the diagnosis, monitoring and treatment still remain largely empirical 1,2 . We reasoned that molecular characterization of SLE by whole blood transcriptomics may facilitate early diagnosis and personalized therapy. To this end, we analyzed genotypes and RNA-seq in 142 patients and 58 matched healthy individuals to define the global transcriptional signature of SLE. By controlling for the estimated proportions of circulating immune cell types, we show that the Interferon (IFN) and p53 pathways are robustly expressed. We also report cell-specific, disease-dependent regulation of gene expression and define a core/susceptibility and a flare/activity disease expression signature, with oxidative phosphorylation, ribosome regulation and cell cycle pathways being enriched in lupus flares. Using these data, we define a novel index of disease activity/severity by combining the validated Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) 1 with a new variable derived from principal component analysis (PCA) of RNA-seq data. We also delineate unique signatures across disease endo-phenotypes whereby active nephritis exhibits the most extensive changes in transcriptome, including prominent drugable signatures such as granulocyte and plasmablast/plasma cell activation. The substantial differences in gene expression between SLE and healthy individuals enables the classification of disease versus healthy status with median sensitivity and specificity of 83% and 100%, respectively. We explored the genetic regulation of blood transcriptome in SLE and found 3142 cis -expression quantitative trait loci (eQTLs). By integration of SLE genome-wide association study (GWAS) signals and eQTLs from 44 tissues from the Genotype-Tissue Expression (GTEx) consortium, we demonstrate that the genetic causality of SLE arises from multiple tissues with the top causal tissue being the liver, followed by brain basal ganglia, adrenal gland and whole blood. Collectively, our study defines distinct susceptibility and activity/severity signatures in SLE that may facilitate diagnosis, monitoring, and personalized therapy.
DOI: 10.1158/1538-7445.sabcs18-466
2019
Abstract 466: A novel DNA methylation-based approach for molecular subtyping and improved prognostication of colorectal cancer using formalin-fixed and paraffin-embedded tissue
Background: Colorectal cancer (CRC) is characterized by marked inter-tumor heterogeneity, both molecularly and clinically. Patient stratification using transcriptional subtyping show promise to resolve the heterogeneity and guide precision medicine. This requires high quality RNA, which can be purified from fresh frozen (FF) tissue, but not from clinically collected formalin-fixed paraffin-embedded (FFPE) tissue. Consequently, transcriptional subtyping is not applicable to most CRC patients.Aim: To establish a DNA methylation-based approach for molecular characterization and subtyping, which is compatible with both FF and FFPE tissue, and use this to improve patient prognostication as compared to histopathological TNM staging.Methods: Using paired RNA expression and DNA methylation profiles (450K array) from 394 CRCs, we identified 200 CpG sites genome-wide, whose methylation levels correlated best with expression for each gene. With this information, we 1) devised an approach, methCORR, that can infer RNA expression in both FF and FFPE samples using only DNA methylation profiles, 2) established a methCORR network that clusters genes according to overlap in the CpG sites associated with their expression level and 3) used the network to determine and characterize biological traits associated with tumor aggressiveness.Results: The methCORR-inferred RNA expression profiles in FF tissue consistently exhibited high correlation to matched RNA-seq. profiles (median Pearson r=0.97; range 0.91-0.98). We found a higher correlation between FF RNA-seq. and FFPE inferred RNA profiles (median r= 0.97; range 0.96-0.98) than between FF and FFPE RNA-seq. profiles (median r= 0.78; range 0.51-0.92). Gene expression profiles were inferred in two FF (n=231, n=203), and two FFPE cohorts (n=113, n=56). Clustering of these profiles identified the same two CRC subtypes independently in all cohorts. Characterization using the methCORR network showed that the two subtypes resembled conventional and serrated CRC. Moreover, methCORR network analysis identified subtype-specific traits that associated strongly with tumor aggressiveness, such as T-cell, fibroblast, and epithelial-mesenchymal transition activity, and thus allowed identification of subtype-specific prognostic biomarkers. These better predicted relapse-free survival (HR=3.22, 95%CI 2.00-5.17) than TNM staging (HR=1.99, 95%CI 1.28-3.08). Finally, we derived four simple and clinically applicable DNA methylation-specific PCR assays for subtyping and prognostication of CRC FFPE samples.Conclusion: We have developed a novel method for characterization, subtyping, and prognostication of CRC, which is compatible with FF and FFPE samples. We envision that application of the methCORR approach to other cancer types will generate similar fruitful results.Citation Format: Trine B. Mattesen, Mads H. Rasmussen, Juan Sandoval, Halit Ongen, Sigrid S. Árnadóttir, Anders H. Madsen, Søren Laurberg, Emmanouil T. Dermitzakis, Manel Esteller, Claus L. Andersen, Jesper B. Bramsen. A novel DNA methylation-based approach for molecular subtyping and improved prognostication of colorectal cancer using formalin-fixed and paraffin-embedded tissue [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 466.
DOI: 10.1016/j.annder.2020.08.016
2020
Analyses intégratives de 100 génomes de carcinome baso-cellulaire dans le contexte des profils de transcription et de méthylation
Le carcinome basocellulaire (CBC) de la peau est la tumeur maligne humaine la plus fréquente avec un risque à vie de 30 % et une incidence croissante. Le CBC est caractérisé par une activation de la voie Sonic Hedgehog et par un phénotype hyper-mutant dû aux UV. Dans cette étude, nous présentons le séquencage de 96 génomes et 259 exomes de CBC. Ces données sont complétées par l’ARNseq et le profilage de méthylation du CBC et de la peau. Plus de 20 millions de mutations ponctuelles ont permis une analyse de la mutagenèse induite par les UV et ont révélé des différences entre CBC et mélanome. La variation des spectres et la clustérisassions des mutations à travers les régions génomiques avec différentes co-variables génomiques, ont été utilisées pour la décomposition en processus biologiques distincts. La découverte de « drivers » a validé des gènes connus, y compris PTCH1, SMO, MYCN, PTPN14 et de nouveau gènes dans d’autres voies de cancer. L’analyse d’ARN a révélé les voies impliquées dans la pathogenèse des CBC et l’hétérogénéité histologique. Ce travail représente une première étude génomique à grande échelle des mécanismes mutationnels de CBC et des conséquences fonctionnelles des mutations codantes et non-codantes.
DOI: 10.1158/1538-7445.am2019-466
2019
Abstract 466: A novel DNA methylation-based approach for molecular subtyping and improved prognostication of colorectal cancer using formalin-fixed and paraffin-embedded tissue
Abstract Background: Colorectal cancer (CRC) is characterized by marked inter-tumor heterogeneity, both molecularly and clinically. Patient stratification using transcriptional subtyping show promise to resolve the heterogeneity and guide precision medicine. This requires high quality RNA, which can be purified from fresh frozen (FF) tissue, but not from clinically collected formalin-fixed paraffin-embedded (FFPE) tissue. Consequently, transcriptional subtyping is not applicable to most CRC patients. Aim: To establish a DNA methylation-based approach for molecular characterization and subtyping, which is compatible with both FF and FFPE tissue, and use this to improve patient prognostication as compared to histopathological TNM staging. Methods: Using paired RNA expression and DNA methylation profiles (450K array) from 394 CRCs, we identified 200 CpG sites genome-wide, whose methylation levels correlated best with expression for each gene. With this information, we 1) devised an approach, methCORR, that can infer RNA expression in both FF and FFPE samples using only DNA methylation profiles, 2) established a methCORR network that clusters genes according to overlap in the CpG sites associated with their expression level and 3) used the network to determine and characterize biological traits associated with tumor aggressiveness. Results: The methCORR-inferred RNA expression profiles in FF tissue consistently exhibited high correlation to matched RNA-seq. profiles (median Pearson r=0.97; range 0.91-0.98). We found a higher correlation between FF RNA-seq. and FFPE inferred RNA profiles (median r= 0.97; range 0.96-0.98) than between FF and FFPE RNA-seq. profiles (median r= 0.78; range 0.51-0.92). Gene expression profiles were inferred in two FF (n=231, n=203), and two FFPE cohorts (n=113, n=56). Clustering of these profiles identified the same two CRC subtypes independently in all cohorts. Characterization using the methCORR network showed that the two subtypes resembled conventional and serrated CRC. Moreover, methCORR network analysis identified subtype-specific traits that associated strongly with tumor aggressiveness, such as T-cell, fibroblast, and epithelial-mesenchymal transition activity, and thus allowed identification of subtype-specific prognostic biomarkers. These better predicted relapse-free survival (HR=3.22, 95%CI 2.00-5.17) than TNM staging (HR=1.99, 95%CI 1.28-3.08). Finally, we derived four simple and clinically applicable DNA methylation-specific PCR assays for subtyping and prognostication of CRC FFPE samples. Conclusion: We have developed a novel method for characterization, subtyping, and prognostication of CRC, which is compatible with FF and FFPE samples. We envision that application of the methCORR approach to other cancer types will generate similar fruitful results. Citation Format: Trine B. Mattesen, Mads H. Rasmussen, Juan Sandoval, Halit Ongen, Sigrid S. Árnadóttir, Anders H. Madsen, Søren Laurberg, Emmanouil T. Dermitzakis, Manel Esteller, Claus L. Andersen, Jesper B. Bramsen. A novel DNA methylation-based approach for molecular subtyping and improved prognostication of colorectal cancer using formalin-fixed and paraffin-embedded tissue [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 466.
DOI: 10.5281/zenodo.1222172
2018
Contribution Of Allelic Imbalance To Colorectal Cancer
<strong>Point mutations in cancer have been extensively studied but chromosomal gains and losses have been more challenging to interpret due to their unspecific nature. Here we examine high-resolution allelic imbalance (AI) landscape in 1699 colorectal cancers, 256 of which have been whole genome sequenced (WGSed). The imbalances pinpoint 38 genes as plausible AI targets based on previous knowledge, and unbiased CRISPR-Cas9 knockout and activation screens identified altogether 79 genes within AI peaks regulating cell growth. Genetic and functional data implicates loss of TP53 as a sufficient driver of AI. The WGS highlights an influence of copy number aberrations on the rate of detected somatic point mutations. Importantly, the data reveal several associations between AI target genes, suggesting a role for a network of lineage-determining transcription factors in colorectal tumorigenesis. Overall, the results unravel the contribution of AI in colorectal cancer and provide a plausible explanation why so few genes are commonly affected by point mutations in cancers.</strong>
DOI: 10.5281/zenodo.1222171
2018
Contribution Of Allelic Imbalance To Colorectal Cancer
<strong>Point mutations in cancer have been extensively studied but chromosomal gains and losses have been more challenging to interpret due to their unspecific nature. Here we examine high-resolution allelic imbalance (AI) landscape in 1699 colorectal cancers, 256 of which have been whole genome sequenced (WGSed). The imbalances pinpoint 38 genes as plausible AI targets based on previous knowledge, and unbiased CRISPR-Cas9 knockout and activation screens identified altogether 79 genes within AI peaks regulating cell growth. Genetic and functional data implicates loss of TP53 as a sufficient driver of AI. The WGS highlights an influence of copy number aberrations on the rate of detected somatic point mutations. Importantly, the data reveal several associations between AI target genes, suggesting a role for a network of lineage-determining transcription factors in colorectal tumorigenesis. Overall, the results unravel the contribution of AI in colorectal cancer and provide a plausible explanation why so few genes are commonly affected by point mutations in cancers.</strong>
DOI: 10.1101/2021.10.18.21264945
2021
Leveraging interindividual variability of regulatory activity refines genetic regulation of gene expression in schizophrenia
ABSTRACT Schizophrenia is a polygenic psychiatric disorder with limited understanding about the mechanistic changes in gene expression regulation. To elucidate on this, we integrate interindividual variability of regulatory activity with gene expression and genotype data captured from the prefrontal cortex of 272 cases and controls. We show that regulatory element activity is structured into 10,936 and 10,376 cis-regulatory domains in cases and controls, respectively, which display distinct regulatory element coordination structures in both states. By studying the interplay among genetic variants, gene expression and cis-regulatory domains, we ascertain that changes in coordinated regulatory activity tag alterations in gene expression levels (p=8.62e-06, OR=1.60), unveil case-specific QTL effects, and identify regulatory machinery changes for genes affecting synaptic function and dendritic spine morphology in schizophrenia. Altogether, we show that accounting for coordinated regulatory activity provides a novel mechanistic approach to reduce the search space for unveiling genetically perturbed regulation of gene expression in schizophrenia.
DOI: 10.21203/rs.3.rs-1138340/v1
2021
Transposable elements mediate genetic effects altering the expression of nearby genes in colorectal cancer
Abstract Transposable elements (TEs) are interspersed repeats that contribute to more than half of the human genome, and TE-embedded regulatory sequences are increasingly recognized as major components of the human regulome. Perturbations of this system can contribute to tumorigenesis, but the impact of TEs on gene expression in cancer cells remains to be fully assessed. Here, we analyzed 275 normal colon and 276 colorectal cancer (CRC) samples from the SYSCOL colorectal cancer cohort and discovered 10,111 and 5,152 TE expression quantitative trait loci (eQTLs) in normal and tumor tissues, respectively. Amongst the latter, 376 were exclusive to CRC, likely driven by changes in methylation patterns. We identified that transcription factors are more enriched in tumor-specific TE-eQTLs than shared TE-eQTLs, indicating that TEs are more specifically regulated in tumor than normal. Using Bayesian Networks to assess the causal relationship between eQTL variants, TEs and genes, we identified that 1,758 TEs are mediators of genetic effect, altering the expression of 1,626 nearby genes significantly more in tumor compared to normal, of which 51 are cancer driver genes. We show that tumor-specific TE-eQTLs trigger the driver capability of TEs subsequently impacting expression of nearby genes. Collectively, our results highlight a global profile of a new class of cancer drivers, thereby enhancing our understanding of tumorigenesis and providing potential new candidate mechanisms for therapeutic target development.
DOI: 10.1101/2021.12.03.471093
2021
Transposable elements mediate genetic effects altering the expression of nearby genes in colorectal cancer
ABSTRACT Transposable elements (TEs) are interspersed repeats that contribute to more than half of the human genome, and TE-embedded regulatory sequences are increasingly recognized as major components of the human regulome. Perturbations of this system can contribute to tumorigenesis, but the impact of TEs on gene expression in cancer cells remains to be fully assessed. Here, we analyzed 275 normal colon and 276 colorectal cancer (CRC) samples from the SYSCOL colorectal cancer cohort and discovered 10,111 and 5,152 TE expression quantitative trait loci (eQTLs) in normal and tumor tissues, respectively. Amongst the latter, 376 were exclusive to CRC, likely driven by changes in methylation patterns. We identified that transcription factors are more enriched in tumor-specific TE-eQTLs than shared TE-eQTLs, indicating that TEs are more specifically regulated in tumor than normal. Using Bayesian Networks to assess the causal relationship between eQTL variants, TEs and genes, we identified that 1,758 TEs are mediators of genetic effect, altering the expression of 1,626 nearby genes significantly more in tumor compared to normal, of which 51 are cancer driver genes. We show that tumor-specific TE-eQTLs trigger the driver capability of TEs subsequently impacting expression of nearby genes. Collectively, our results highlight a global profile of a new class of cancer drivers, thereby enhancing our understanding of tumorigenesis and providing potential new candidate mechanisms for therapeutic target development.
DOI: 10.21203/rs.3.rs-989712/v1
2021
Leveraging interindividual variability of regulatory activity refines genetic regulation of gene expression in schizophrenia
Abstract Schizophrenia is a polygenic psychiatric disorder with limited understanding about the mechanistic changes in gene expression regulation. To elucidate on this, we integrate interindividual variability of regulatory activity with gene expression and genotype data captured from the prefrontal cortex of 272 cases and controls. We show that regulatory element activity is structured into 10,936 and 10,376 cis-regulatory domains in cases and controls, respectively, which display distinct regulatory element coordination structures in both states. By studying the interplay among genetic variants, gene expression and cis-regulatory domains, we ascertain that changes in coordinated regulatory activity tag alterations in gene expression levels (p=8.62e-06, OR=1.60), unveil case-specific QTL effects, and identify regulatory machinery changes for genes affecting synaptic function and dendritic spine morphology in schizophrenia. Altogether, we show that accounting for coordinated regulatory activity provides a novel mechanistic approach to reduce the search space for unveiling genetically perturbed regulation of gene expression in schizophrenia.