ϟ

Peter M. Visscher

Here are all the papers by Peter M. Visscher that you can download and read on OA.mg.
Peter M. Visscher’s last known institution is . Download Peter M. Visscher PDFs here.

Claim this Profile →
DOI: 10.1038/nature08494
2009
Cited 7,462 times
Finding the missing heritability of complex diseases
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, 'missing' heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
DOI: 10.1016/j.ajhg.2010.11.011
2011
Cited 6,128 times
GCTA: A Tool for Genome-wide Complex Trait Analysis
For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.
DOI: 10.1038/ng.608
2010
Cited 3,977 times
Common SNPs explain a large proportion of the heritability for human height
Peter Visscher and colleagues report an analysis of the heritability explained by common variants identified through genome-wide association studies. They find that 45% of the variance for height can be explained by using a linear model to simultaneously consider the combined effect of common SNPs. SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.
DOI: 10.1016/j.ajhg.2017.06.005
2017
Cited 2,860 times
10 Years of GWAS Discovery: Biology, Function, and Translation
Application of the experimental design of genome-wide association studies (GWASs) is now 10 years old (young), and here we review the remarkable range of discoveries it has facilitated in population and complex-trait genetics, the biology of diseases, and translation toward new therapeutics. We predict the likely discoveries in the next 10 years, when GWASs will be based on millions of samples with array data imputed to a large fully sequenced reference panel and on hundreds of thousands of samples with whole-genome sequencing data.
DOI: 10.1016/j.ajhg.2011.11.029
2012
Cited 2,198 times
Five Years of GWAS Discovery
The past five years have seen many scientific and biological discoveries made through the experimental design of genome-wide association studies (GWASs). These studies were aimed at detecting variants at genomic loci that are associated with complex traits in the population and, in particular, at detecting associations between common single-nucleotide polymorphisms (SNPs) and common diseases such as heart disease, diabetes, auto-immune diseases, and psychiatric disorders. We start by giving a number of quotes from scientists and journalists about perceived problems with GWASs. We will then briefly give the history of GWASs and focus on the discoveries made through this experimental design, what those discoveries tell us and do not tell us about the genetics and biology of complex traits, and what immediate utility has come out of these studies. Rather than giving an exhaustive review of all reported findings for all diseases and other complex traits, we focus on the results for auto-immune diseases and metabolic diseases. We return to the perceived failure or disappointment about GWASs in the concluding section.
DOI: 10.1038/nature12873
2013
Cited 1,981 times
Genetics of rheumatoid arthritis contributes to biology and drug discovery
A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological data sets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA). Here we performed a genome-wide association study meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ∼10 million single-nucleotide polymorphisms. We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 101 (refs 2 - 4). We devised an in silico pipeline using established bioinformatics methods based on functional annotation, cis-acting expression quantitative trait loci and pathway analyses--as well as novel methods based on genetic overlap with human primary immunodeficiency, haematological cancer somatic mutations and knockout mouse phenotypes--to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
DOI: 10.1038/s41588-018-0147-3
2018
Cited 1,907 times
Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals
Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research. Gene discovery and polygenic predictions from a genome-wide association study of educational attainment in 1.1 million individuals.
DOI: 10.1038/ng.3285
2015
Cited 1,875 times
Meta-analysis of the heritability of human traits based on fifty years of twin studies
DOI: 10.1038/ng.3538
2016
Cited 1,764 times
Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.
DOI: 10.1038/nrg2322
2008
Cited 1,601 times
Heritability in the genomics era — concepts and misconceptions
DOI: 10.1093/hmg/ddy271
2018
Cited 1,592 times
Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry
Recent genome-wide association studies (GWAS) of height and body mass index (BMI) in ∼250000 European participants have led to the discovery of ∼700 and ∼100 nearly independent single nucleotide polymorphisms (SNPs) associated with these traits, respectively. Here we combine summary statistics from those two studies with GWAS of height and BMI performed in ∼450000 UK Biobank participants of European ancestry. Overall, our combined GWAS meta-analysis reaches N ∼700000 individuals and substantially increases the number of GWAS signals associated with these traits. We identified 3290 and 941 near-independent SNPs associated with height and BMI, respectively (at a revised genome-wide significance threshold of P < 1 × 10−8), including 1185 height-associated SNPs and 751 BMI-associated SNPs located within loci not previously identified by these two GWAS. The near-independent genome-wide significant SNPs explain ∼24.6% of the variance of height and ∼6.0% of the variance of BMI in an independent sample from the Health and Retirement Study (HRS). Correlations between polygenic scores based upon these SNPs with actual height and BMI in HRS participants were ∼0.44 and ∼0.22, respectively. From analyses of integrating GWAS and expression quantitative trait loci (eQTL) data by summary-data-based Mendelian randomization, we identified an enrichment of eQTLs among lead height and BMI signals, prioritizing 610 and 138 genes, respectively. Our study demonstrates that, as previously predicted, increasing GWAS sample sizes continues to deliver, by the discovery of new loci, increasing prediction accuracy and providing additional data to achieve deeper insight into complex trait biology. All summary statistics are made available for follow-up studies.
DOI: 10.1038/ng.2756
2013
Cited 1,538 times
Systematic identification of trans eQTLs as putative drivers of known disease associations
Identifying the downstream effects of disease-associated SNPs is challenging. To help overcome this problem, we performed expression quantitative trait locus (eQTL) meta-analysis in non-transformed peripheral blood samples from 5,311 individuals with replication in 2,775 individuals. We identified and replicated trans eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Some of these SNPs affect multiple genes in trans that are known to be altered in individuals with disease: rs4917014, previously associated with systemic lupus erythematosus (SLE), altered gene expression of C1QB and five type I interferon response genes, both hallmarks of SLE. DeepSAGE RNA sequencing showed that rs4917014 strongly alters the 3' UTR levels of IKZF1 in cis, and chromatin immunoprecipitation and sequencing analysis of the trans-regulated genes implicated IKZF1 as the causal gene. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
DOI: 10.1016/s1474-4422(19)30320-5
2019
Cited 1,480 times
Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies
Genome-wide association studies (GWAS) in Parkinson's disease have increased the scope of biological knowledge about the disease over the past decade. We aimed to use the largest aggregate of GWAS data to identify novel risk loci and gain further insight into the causes of Parkinson's disease.We did a meta-analysis of 17 datasets from Parkinson's disease GWAS available from European ancestry samples to nominate novel loci for disease risk. These datasets incorporated all available data. We then used these data to estimate heritable risk and develop predictive models of this heritability. We also used large gene expression and methylation resources to examine possible functional consequences as well as tissue, cell type, and biological pathway enrichments for the identified risk factors. Additionally, we examined shared genetic risk between Parkinson's disease and other phenotypes of interest via genetic correlations followed by Mendelian randomisation.Between Oct 1, 2017, and Aug 9, 2018, we analysed 7·8 million single nucleotide polymorphisms in 37 688 cases, 18 618 UK Biobank proxy-cases (ie, individuals who do not have Parkinson's disease but have a first degree relative that does), and 1·4 million controls. We identified 90 independent genome-wide significant risk signals across 78 genomic regions, including 38 novel independent risk signals in 37 loci. These 90 variants explained 16-36% of the heritable risk of Parkinson's disease depending on prevalence. Integrating methylation and expression data within a Mendelian randomisation framework identified putatively associated genes at 70 risk signals underlying GWAS loci for follow-up functional studies. Tissue-specific expression enrichment analyses suggested Parkinson's disease loci were heavily brain-enriched, with specific neuronal cell types being implicated from single cell data. We found significant genetic correlations with brain volumes (false discovery rate-adjusted p=0·0035 for intracranial volume, p=0·024 for putamen volume), smoking status (p=0·024), and educational attainment (p=0·038). Mendelian randomisation between cognitive performance and Parkinson's disease risk showed a robust association (p=8·00 × 10-7).These data provide the most comprehensive survey of genetic risk within Parkinson's disease to date, to the best of our knowledge, by revealing many additional Parkinson's disease risk loci, providing a biological context for these risk factors, and showing that a considerable genetic component of this disease remains unidentified. These associations derived from European ancestry datasets will need to be followed-up with more diverse data.The National Institute on Aging at the National Institutes of Health (USA), The Michael J Fox Foundation, and The Parkinson's Foundation (see appendix for full list of funding sources).
DOI: 10.1038/ng.2213
2012
Cited 1,366 times
Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits
We present an approximate conditional and joint association analysis that can use summary-level statistics from a meta-analysis of genome-wide association studies (GWAS) and estimated linkage disequilibrium (LD) from a reference sample with individual-level genotype data. Using this method, we analyzed meta-analysis summary data from the GIANT Consortium for height and body mass index (BMI), with the LD structure estimated from genotype data in two independent cohorts. We identified 36 loci with multiple associated variants for height (38 leading and 49 additional SNPs, 87 in total) via a genome-wide SNP selection procedure. The 49 new SNPs explain approximately 1.3% of variance, nearly doubling the heritability explained at the 36 loci. We did not find any locus showing multiple associated SNPs for BMI. The method we present is computationally fast and is also applicable to case-control data, which we demonstrate in an example from meta-analysis of type 2 diabetes by the DIAGRAM Consortium.
DOI: 10.1093/ije/dyt179
2012
Cited 1,125 times
Calculating statistical power in Mendelian randomization studies
In Mendelian randomization (MR) studies, where genetic variants are used as proxy measures for an exposure trait of interest, obtaining adequate statistical power is frequently a concern due to the small amount of variation in a phenotypic trait that is typically explained by genetic variants. A range of power estimates based on simulations and specific parameters for two-stage least squares (2SLS) MR analyses based on continuous variables has previously been published. However there are presently no specific equations or software tools one can implement for calculating power of a given MR study. Using asymptotic theory, we show that in the case of continuous variables and a single instrument, for example a single-nucleotide polymorphism (SNP) or multiple SNP predictor, statistical power for a fixed sample size is a function of two parameters: the proportion of variation in the exposure variable explained by the genetic predictor and the true causal association between the exposure and outcome variable. We demonstrate that power for 2SLS MR can be derived using the non-centrality parameter (NCP) of the statistical test that is employed to test whether the 2SLS regression coefficient is zero. We show that the previously published power estimates from simulations can be represented theoretically using this NCP-based approach, with similar estimates observed when the simulation-based estimates are compared with our NCP-based approach. General equations for calculating statistical power for 2SLS MR using the NCP are provided in this note, and we implement the calculations in a web-based application.
DOI: 10.1016/j.ajhg.2011.02.002
2011
Cited 984 times
Estimating Missing Heritability for Disease from Genome-wide Association Studies
Genome-wide association studies are designed to discover SNPs that are associated with a complex trait. Employing strict significance thresholds when testing individual SNPs avoids false positives at the expense of increasing false negatives. Recently, we developed a method for quantitative traits that estimates the variation accounted for when fitting all SNPs simultaneously. Here we develop this method further for case-control studies. We use a linear mixed model for analysis of binary traits and transform the estimates to a liability scale by adjusting both for scale and for ascertainment of the case samples. We show by theory and simulation that the method is unbiased. We apply the method to data from the Wellcome Trust Case Control Consortium and show that a substantial proportion of variation in liability for Crohn disease, bipolar disorder, and type I diabetes is tagged by common SNPs. Genome-wide association studies are designed to discover SNPs that are associated with a complex trait. Employing strict significance thresholds when testing individual SNPs avoids false positives at the expense of increasing false negatives. Recently, we developed a method for quantitative traits that estimates the variation accounted for when fitting all SNPs simultaneously. Here we develop this method further for case-control studies. We use a linear mixed model for analysis of binary traits and transform the estimates to a liability scale by adjusting both for scale and for ascertainment of the case samples. We show by theory and simulation that the method is unbiased. We apply the method to data from the Wellcome Trust Case Control Consortium and show that a substantial proportion of variation in liability for Crohn disease, bipolar disorder, and type I diabetes is tagged by common SNPs.
DOI: 10.1186/s13059-015-0584-6
2015
Cited 960 times
DNA methylation age of blood predicts all-cause mortality in later life
DNA methylation levels change with age. Recent studies have identified biomarkers of chronological age based on DNA methylation levels. It is not yet known whether DNA methylation age captures aspects of biological age. Here we test whether differences between people’s chronological ages and estimated ages, DNA methylation age, predict all-cause mortality in later life. The difference between DNA methylation age and chronological age (Δage) was calculated in four longitudinal cohorts of older people. Meta-analysis of proportional hazards models from the four cohorts was used to determine the association between Δage and mortality. A 5-year higher Δage is associated with a 21% higher mortality risk, adjusting for age and sex. After further adjustments for childhood IQ, education, social class, hypertension, diabetes, cardiovascular disease, and APOE e4 status, there is a 16% increased mortality risk for those with a 5-year higher Δage. A pedigree-based heritability analysis of Δage was conducted in a separate cohort. The heritability of Δage was 0.43. DNA methylation-derived measures of accelerated aging are heritable traits that predict mortality independently of health status, lifestyle factors, and known genetic factors.
DOI: 10.1371/journal.pgen.1000008
2008
Cited 897 times
Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits
The relative proportion of additive and non-additive variation for complex traits is important in evolutionary biology, medicine, and agriculture. We address a long-standing controversy and paradox about the contribution of non-additive genetic variation, namely that knowledge about biological pathways and gene networks imply that epistasis is important. Yet empirical data across a range of traits and species imply that most genetic variance is additive. We evaluate the evidence from empirical studies of genetic variance components and find that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance. We present new theoretical results, based upon the distribution of allele frequencies under neutral and other population genetic models, that show why this is the case even if there are non-additive effects at the level of gene action. We conclude that interactions at the level of genes are not likely to generate much interaction at the level of variance.
DOI: 10.1038/ng.823
2011
Cited 852 times
Genome partitioning of genetic variation for complex traits using common SNPs
We estimate and partition genetic variation for height, body mass index (BMI), von Willebrand factor and QT interval (QTi) using 586,898 SNPs genotyped on 11,586 unrelated individuals. We estimate that ∼45%, ∼17%, ∼25% and ∼21% of the variance in height, BMI, von Willebrand factor and QTi, respectively, can be explained by all autosomal SNPs and a further ∼0.5-1% can be explained by X chromosome SNPs. We show that the variance explained by each chromosome is proportional to its length, and that SNPs in or near genes explain more variation than SNPs between genes. We propose a new approach to estimate variation due to cryptic relatedness and population stratification. Our results provide further evidence that a substantial proportion of heritability is captured by common SNPs, that height, BMI and QTi are highly polygenic traits, and that the additive variation explained by a part of the genome is approximately proportional to the total length of DNA contained within genes therein.
DOI: 10.1038/ng.2876
2014
Cited 840 times
Advantages and pitfalls in the application of mixed-model association methods
Alkes Price, Peter Visscher and colleagues provide recommendations on the application of mixed-linear-model association methods across a range of study designs. Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.
DOI: 10.1016/j.ajhg.2010.06.009
2010
Cited 772 times
A Versatile Gene-Based Test for Genome-wide Association Studies
We have derived a versatile gene-based test for genome-wide association studies (GWAS). Our approach, called VEGAS (versatile gene-based association study), is applicable to all GWAS designs, including family-based GWAS, meta-analyses of GWAS on the basis of summary data, and DNA-pooling-based GWAS, where existing approaches based on permutation are not possible, as well as singleton data, where they are. The test incorporates information from a full set of markers (or a defined subset) within a gene and accounts for linkage disequilibrium between markers by using simulations from the multivariate normal distribution. We show that for an association study using singletons, our approach produces results equivalent to those obtained via permutation in a fraction of the computation time. We demonstrate proof-of-principle by using the gene-based test to replicate several genes known to be associated on the basis of results from a family-based GWAS for height in 11,536 individuals and a DNA-pooling-based GWAS for melanoma in approximately 1300 cases and controls. Our method has the potential to identify novel associated genes; provide a basis for selecting SNPs for replication; and be directly used in network (pathway) approaches that require per-gene association test statistics. We have implemented the approach in both an easy-to-use web interface, which only requires the uploading of markers with their association p-values, and a separate downloadable application.
DOI: 10.18632/aging.101020
2016
Cited 765 times
DNA methylation-based measures of biological age: meta-analysis predicting time to death
Estimates of biological age based on DNA methylation patterns, often referred to as "epigenetic age", "DNAm age", have been shown to be robust biomarkers of age in humans. We previously demonstrated that independent of chronological age, epigenetic age assessed in blood predicted all-cause mortality in four human cohorts. Here, we expanded our original observation to 13 different cohorts for a total sample size of 13,089 individuals, including three racial/ethnic groups. In addition, we examined whether incorporating information on blood cell composition into the epigenetic age metrics improves their predictive power for mortality. All considered measures of epigenetic age acceleration were predictive of mortality (p≤8.2x10-9), independent of chronological age, even after adjusting for additional risk factors (p<5.4x10-4), and within the racial/ethnic groups that we examined (non-Hispanic whites, Hispanics, African Americans). Epigenetic age estimates that incorporated information on blood cell composition led to the smallest p-values for time to death (p=7.5x10-43). Overall, this study a) strengthens the evidence that epigenetic age predicts all-cause mortality above and beyond chronological age and traditional risk factors, and b) demonstrates that epigenetic age estimates that incorporate information on blood cell counts lead to highly significant associations with all-cause mortality.
DOI: 10.1038/s41588-017-0009-4
2018
Cited 722 times
Multi-trait analysis of genome-wide association summary statistics using MTAG
We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
DOI: 10.1038/ng.3390
2015
Cited 718 times
Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index
Jian Yang and colleagues present a method, GREML-LDMS, to estimate heritability for complex human traits using whole-genome sequencing data or imputation with the 1000 Genomes Project reference panel. Using the heritability estimates from GREML-LDMS, they find that there is negligible missing heritability for human height and BMI. We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60–70% for height and 30–40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.
DOI: 10.1161/circgenetics.116.001506
2016
Cited 688 times
Epigenetic Signatures of Cigarette Smoking
Background— DNA methylation leaves a long-term signature of smoking exposure and is one potential mechanism by which tobacco exposure predisposes to adverse health outcomes, such as cancers, osteoporosis, lung, and cardiovascular disorders. Methods and Results— To comprehensively determine the association between cigarette smoking and DNA methylation, we conducted a meta-analysis of genome-wide DNA methylation assessed using the Illumina BeadChip 450K array on 15 907 blood-derived DNA samples from participants in 16 cohorts (including 2433 current, 6518 former, and 6956 never smokers). Comparing current versus never smokers, 2623 cytosine–phosphate–guanine sites (CpGs), annotated to 1405 genes, were statistically significantly differentially methylated at Bonferroni threshold of P &lt;1×10 −7 (18 760 CpGs at false discovery rate &lt;0.05). Genes annotated to these CpGs were enriched for associations with several smoking-related traits in genome-wide studies including pulmonary function, cancers, inflammatory diseases, and heart disease. Comparing former versus never smokers, 185 of the CpGs that differed between current and never smokers were significant P &lt;1×10 −7 (2623 CpGs at false discovery rate &lt;0.05), indicating a pattern of persistent altered methylation, with attenuation, after smoking cessation. Transcriptomic integration identified effects on gene expression at many differentially methylated CpGs. Conclusions— Cigarette smoking has a broad impact on genome-wide methylation that, at many loci, persists many years after smoking cessation. Many of the differentially methylated genes were novel genes with respect to biological effects of smoking and might represent therapeutic targets for prevention or treatment of tobacco-related diseases. Methylation at these sites could also serve as sensitive and stable biomarkers of lifetime exposure to tobacco smoke.
DOI: 10.1038/s41467-017-02317-2
2018
Cited 667 times
Causal associations between risk factors and common diseases inferred from GWAS summary data
Health risk factors such as body mass index (BMI) and serum cholesterol are associated with many common diseases. It often remains unclear whether the risk factors are cause or consequence of disease, or whether the associations are the result of confounding. We develop and apply a method (called GSMR) that performs a multi-SNP Mendelian randomization analysis using summary-level data from genome-wide association studies to test the causal associations of BMI, waist-to-hip ratio, serum cholesterols, blood pressures, height, and years of schooling (EduYears) with common diseases (sample sizes of up to 405,072). We identify a number of causal associations including a protective effect of LDL-cholesterol against type-2 diabetes (T2D) that might explain the side effects of statins on T2D, a protective effect of EduYears against Alzheimer's disease, and bidirectional associations with opposite effects (e.g., higher BMI increases the risk of T2D but the effect of T2D on BMI is negative).
DOI: 10.1038/nrg3457
2013
Cited 637 times
Pitfalls of predicting complex traits from SNPs
The success of genome-wide association studies (GWASs) has led to increasing interest in making predictions of complex trait phenotypes, including disease, from genotype data. Rigorous assessment of the value of predictors is crucial before implementation. Here we discuss some of the limitations and pitfalls of prediction analysis and show how naive implementations can lead to severe bias and misinterpretation of results.
DOI: 10.1038/ng.286
2009
Cited 633 times
DNA methylation profiles in monozygotic and dizygotic twins
DOI: 10.1101/gr.6665407
2007
Cited 609 times
Prediction of individual genetic risk to disease from genome-wide association studies
Empirical studies suggest that the effect sizes of individual causal risk alleles underlying complex genetic diseases are small, with most genotype relative risks in the range of 1.1-2.0. Although the increased risk of disease for a carrier is small for any single locus, knowledge of multiple-risk alleles throughout the genome could allow the identification of individuals that are at high risk. In this study, we investigate the number and effect size of risk loci that underlie complex disease constrained by the disease parameters of prevalence and heritability. Then we quantify the value of prediction of genetic risk to disease using a range of realistic combinations of the number, size, and distribution of risk effects that underlie complex diseases. We propose an approach to assess the genetic risk of a disease in healthy individuals, based on dense genome-wide SNP panels. We test this approach using simulation. When the number of loci contributing to the disease is >50, a large case-control study is needed to identify a set of risk loci for use in predicting the disease risk of healthy people not included in the case-control study. For diseases controlled by 1000 loci of mean relative risk of only 1.04, a case-control study with 10,000 cases and controls can lead to selection of approximately 75 loci that explain >50% of the genetic variance. The 5% of people with the highest predicted risk are three to seven times more likely to suffer the disease than the population average, depending on heritability and disease prevalence. Whether an individual with known genetic risk develops the disease depends on known and unknown environmental factors.
DOI: 10.1038/s41467-018-04951-w
2018
Cited 594 times
Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes
Type 2 diabetes (T2D) is a very common disease in humans. Here we conduct a meta-analysis of genome-wide association studies (GWAS) with ~16 million genetic variants in 62,892 T2D cases and 596,424 controls of European ancestry. We identify 139 common and 4 rare variants associated with T2D, 42 of which (39 common and 3 rare variants) are independent of the known variants. Integration of the gene expression data from blood (n = 14,115 and 2765) with the GWAS results identifies 33 putative functional genes for T2D, 3 of which were targeted by approved drugs. A further integration of DNA methylation (n = 1980) and epigenomic annotation data highlight 3 genes (CAMK1D, TP53INP1, and ATP5G1) with plausible regulatory mechanisms, whereby a genetic variant exerts an effect on T2D through epigenetic regulation of gene expression. Our study uncovers additional loci, proposes putative genetic regulatory mechanisms for T2D, and provides evidence of purifying selection for T2D-associated variants.
DOI: 10.1093/bioinformatics/bts474
2012
Cited 576 times
Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood
Abstract Summary: Genetic correlations are the genome-wide aggregate effects of causal variants affecting multiple traits. Traditionally, genetic correlations between complex traits are estimated from pedigree studies, but such estimates can be confounded by shared environmental factors. Moreover, for diseases, low prevalence rates imply that even if the true genetic correlation between disorders was high, co-aggregation of disorders in families might not occur or could not be distinguished from chance. We have developed and implemented statistical methods based on linear mixed models to obtain unbiased estimates of the genetic correlation between pairs of quantitative traits or pairs of binary traits of complex diseases using population-based case–control studies with genome-wide single-nucleotide polymorphism data. The method is validated in a simulation study and applied to estimate genetic correlation between various diseases from Wellcome Trust Case Control Consortium data in a series of bivariate analyses. We estimate a significant positive genetic correlation between risk of Type 2 diabetes and hypertension of ~0.31 (SE 0.14, P = 0.024). Availability: Our methods, appropriate for both quantitative and binary traits, are implemented in the freely available software GCTA (http://www.complextraitgenomics.com/software/gcta/reml_bivar.html). Contact: hong.lee@uq.edu.au Supplementary Information: Supplementary data are available at Bioinformatics online.
DOI: 10.1038/ng.1108
2012
Cited 572 times
Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs
Schizophrenia is a complex disorder caused by both genetic and environmental factors. Using 9,087 affected individuals, 12,171 controls and 915,354 imputed SNPs from the Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium (PGC-SCZ), we estimate that 23% (s.e. = 1%) of variation in liability to schizophrenia is captured by SNPs. We show that a substantial proportion of this variation must be the result of common causal variants, that the variance explained by each chromosome is linearly related to its length (r = 0.89, P = 2.6 × 10(-8)), that the genetic basis of schizophrenia is the same in males and females, and that a disproportionate proportion of variation is attributable to a set of 2,725 genes expressed in the central nervous system (CNS; P = 7.6 × 10(-8)). These results are consistent with a polygenic genetic architecture and imply more individual SNP associations will be detected for this disease as sample size increases.
DOI: 10.1371/journal.pgen.0020041
2006
Cited 571 times
Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings
The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and natural selection and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by R. A. Fisher in 1918, the estimation of additive and dominance genetic variance and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036). The variance in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will allow the estimation of genetic variation for disease susceptibility and quantitative traits that is free from confounding with non-genetic factors and will allow partitioning of genetic variation into additive and non-additive components.
DOI: 10.1038/mp.2011.85
2011
Cited 569 times
Genome-wide association studies establish that human intelligence is highly heritable and polygenic
General intelligence is an important human quantitative trait that accounts for much of the variation in diverse cognitive abilities. Individual differences in intelligence are strongly associated with many important life outcomes, including educational and occupational attainments, income, health and lifespan. Data from twin and family studies are consistent with a high heritability of intelligence, but this inference has been controversial. We conducted a genome-wide analysis of 3511 unrelated adults with data on 549 692 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits. We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits. We partitioned genetic variation on individual chromosomes and found that, on average, longer chromosomes explain more variation. Finally, using just SNP data we predicted ∼1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P=0.009 and 0.028, respectively). Our results unequivocally confirm that a substantial proportion of individual differences in human intelligence is due to genetic variation, and are consistent with many genes of small effects underlying the additive genetic influences on intelligence.
DOI: 10.1017/s0016672308009981
2009
Cited 565 times
Increased accuracy of artificial selection by using the realized relationship matrix
Dense marker genotypes allow the construction of the realized relationship matrix between individuals, with elements the realized proportion of the genome that is identical by descent (IBD) between pairs of individuals. In this paper, we demonstrate that by replacing the average relationship matrix derived from pedigree with the realized relationship matrix in best linear unbiased prediction (BLUP) of breeding values, the accuracy of the breeding values can be substantially increased, especially for individuals with no phenotype of their own. We further demonstrate that this method of predicting breeding values is exactly equivalent to the genomic selection methodology where the effects of quantitative trait loci (QTLs) contributing to variation in the trait are assumed to be normally distributed. The accuracy of breeding values predicted using the realized relationship matrix in the BLUP equations can be deterministically predicted for known family relationships, for example half sibs. The deterministic method uses the effective number of independently segregating loci controlling the phenotype that depends on the type of family relationship and the length of the genome. The accuracy of predicted breeding values depends on this number of effective loci, the family relationship and the number of phenotypic records. The deterministic prediction demonstrates that the accuracy of breeding values can approach unity if enough relatives are genotyped and phenotyped. For example, when 1000 full sibs per family were genotyped and phenotyped, and the heritability of the trait was 0.5, the reliability of predicted genomic breeding values (GEBVs) for individuals in the same full sib family without phenotypes was 0.82. These results were verified by simulation. A deterministic prediction was also derived for random mating populations, where the effective population size is the key parameter determining the effective number of independently segregating loci. If the effective population size is large, a very large number of individuals must be genotyped and phenotyped in order to accurately predict breeding values for unphenotyped individuals from the same population. If the heritability of the trait is 0.3, and N(e)=100, approximately 12474 individuals with genotypes and phenotypes are required in order to predict GEBVs of un-phenotyped individuals in the same population with an accuracy of 0.7 [corrected].
DOI: 10.1093/bioinformatics/18.2.339
2002
Cited 478 times
QTL Express: mapping quantitative trait loci in simple and complex pedigrees
QTL Express is the first application for Quantitative Trait Locus (QTL) mapping in outbred populations with a web-based user interface. User input of three files containing a marker map, trait data and marker genotypes allows mapping of single or multiple QTL by the regression approach, with the option to perform permutation or bootstrap tests.
DOI: 10.1093/ije/dyu277
2015
Cited 474 times
The epigenetic clock is correlated with physical and cognitive fitness in the Lothian Birth Cohort 1936
The DNA methylation-based 'epigenetic clock' correlates strongly with chronological age, but it is currently unclear what drives individual differences. We examine cross-sectional and longitudinal associations between the epigenetic clock and four mortality-linked markers of physical and mental fitness: lung function, walking speed, grip strength and cognitive ability.DNA methylation-based age acceleration (residuals of the epigenetic clock estimate regressed on chronological age) were estimated in the Lothian Birth Cohort 1936 at ages 70 (n = 920), 73 (n = 299) and 76 (n = 273) years. General cognitive ability, walking speed, lung function and grip strength were measured concurrently. Cross-sectional correlations between age acceleration and the fitness variables were calculated. Longitudinal change in the epigenetic clock estimates and the fitness variables were assessed via linear mixed models and latent growth curves. Epigenetic age acceleration at age 70 was used as a predictor of longitudinal change in fitness. Epigenome-wide association studies (EWASs) were conducted on the four fitness measures.Cross-sectional correlations were significant between greater age acceleration and poorer performance on the lung function, cognition and grip strength measures (r range: -0.07 to -0.05, P range: 9.7 x 10(-3) to 0.024). All of the fitness variables declined over time but age acceleration did not correlate with subsequent change over 6 years. There were no EWAS hits for the fitness traits.Markers of physical and mental fitness are associated with the epigenetic clock (lower abilities associated with age acceleration). However, age acceleration does not associate with decline in these measures, at least over a relatively short follow-up.
DOI: 10.1038/ejhg.2011.39
2011
Cited 466 times
Genomic inflation factors under polygenic inheritance
Population structure, including population stratification and cryptic relatedness, can cause spurious associations in genome-wide association studies (GWAS). Usually, the scaled median or mean test statistic for association calculated from multiple single-nucleotide-polymorphisms across the genome is used to assess such effects, and 'genomic control' can be applied subsequently to adjust test statistics at individual loci by a genomic inflation factor. Published GWAS have clearly shown that there are many loci underlying genetic variation for a wide range of complex diseases and traits, implying that a substantial proportion of the genome should show inflation of the test statistic. Here, we show by theory, simulation and analysis of data that in the absence of population structure and other technical artefacts, but in the presence of polygenic inheritance, substantial genomic inflation is expected. Its magnitude depends on sample size, heritability, linkage disequilibrium structure and the number of causal variants. Our predictions are consistent with empirical observations on height in independent samples of ~4000 and ~133,000 individuals.
DOI: 10.1038/ng0508-489
2008
Cited 426 times
Sizing up human height variation
DOI: 10.1038/s41398-018-0150-6
2018
Cited 423 times
GWAS on family history of Alzheimer’s disease
Abstract Alzheimer’s disease (AD) is a public health priority for the 21st century. Risk reduction currently revolves around lifestyle changes with much research trying to elucidate the biological underpinnings. We show that self-report of parental history of Alzheimer’s dementia for case ascertainment in a genome-wide association study of 314,278 participants from UK Biobank (27,696 maternal cases, 14,338 paternal cases) is a valid proxy for an AD genetic study. After meta-analysing with published consortium data ( n = 74,046 with 25,580 cases across the discovery and replication analyses), three new AD-associated loci ( P &lt; 5 × 10 −8 ) are identified. These contain genes relevant for AD and neurodegeneration: ADAM10 , BCKDK/KAT8 and ACE . Novel gene-based loci include drug targets such as VKORC1 (warfarin dose). We report evidence that the association of SNPs in the TOMM40 gene with AD is potentially mediated by both gene expression and DNA methylation in the prefrontal cortex. However, it is likely that multiple variants are affecting the trait and gene methylation/expression. Our discovered loci may help to elucidate the biological mechanisms underlying AD and, as they contain genes that are drug targets for other diseases and disorders, warrant further exploration for potential precision medicine applications.
DOI: 10.1186/1471-2318-7-28
2007
Cited 418 times
The Lothian Birth Cohort 1936: a study to examine influences on cognitive ageing from age 11 to age 70 and beyond
Cognitive ageing is a major burden for society and a major influence in lowering people's independence and quality of life. It is the most feared aspect of ageing. There are large individual differences in age-related cognitive changes. Seeking the determinants of cognitive ageing is a research priority. A limitation of many studies is the lack of a sufficiently long period between cognitive assessments to examine determinants. Here, the aim is to examine influences on cognitive ageing between childhood and old age. The study is designed as a follow-up cohort study. The participants comprise surviving members of the Scottish Mental Survey of 1947 (SMS1947; N = 70,805) who reside in the Edinburgh area (Lothian) of Scotland. The SMS1947 applied a valid test of general intelligence to all children born in 1936 and attending Scottish schools in June 1947. A total of 1091 participants make up the Lothian Birth Cohort 1936. They undertook: a medical interview and examination; physical fitness testing; extensive cognitive testing (reasoning, memory, speed of information processing, and executive function); personality, quality of life and other psycho-social questionnaires; and a food frequency questionnaire. They have taken the same mental ability test (the Moray House Test No. 12) at age 11 and age 70. They provided blood samples for DNA extraction and testing and other biomarker analyses. Here we describe the background and aims of the study, the recruitment procedures and details of numbers tested, and the details of all examinations. The principal strength of this cohort is the rarely captured phenotype of lifetime cognitive change. There is additional rich information to examine the determinants of individual differences in this lifetime cognitive change. This protocol report is important in alerting other researchers to the data available in the cohort.
DOI: 10.1086/376547
2003
Cited 404 times
Genome Scan Meta-Analysis of Schizophrenia and Bipolar Disorder, Part III: Bipolar Disorder
Genome scans of bipolar disorder (BPD) have not produced consistent evidence for linkage. The rank-based genome scan meta-analysis (GSMA) method was applied to 18 BPD genome scan data sets in an effort to identify regions with significant support for linkage in the combined data. The two primary analyses considered available linkage data for "very narrow" (i.e., BP-I and schizoaffective disorder-BP) and "narrow" (i.e., adding BP-II disorder) disease models, with the ranks weighted for sample size. A "broad" model (i.e., adding recurrent major depression) and unweighted analyses were also performed. No region achieved genomewide statistical significance by several simulation-based criteria. The most significant P values (<.01) were observed on chromosomes 9p22.3-21.1 (very narrow), 10q11.21-22.1 (very narrow), and 14q24.1-32.12 (narrow). Nominally significant P values were observed in adjacent bins on chromosomes 9p and 18p-q, across all three disease models on chromosomes 14q and 18p-q, and across two models on chromosome 8q. Relatively few BPD pedigrees have been studied under narrow disease models relative to the schizophrenia GSMA data set, which produced more significant results. There was no overlap of the highest-ranked regions for the two disorders. The present results for the very narrow model are promising but suggest that more and larger data sets are needed. Alternatively, linkage might be detected in certain populations or subsets of pedigrees. The narrow and broad data sets had considerable power, according to simulation studies, but did not produce more highly significant evidence for linkage. We note that meta-analysis can sometimes provide support for linkage but cannot disprove linkage in any candidate region.
DOI: 10.1101/gr.387103
2003
Cited 399 times
Novel Multilocus Measure of Linkage Disequilibrium to Estimate Past Effective Population Size
Linkage disequilibrium (LD) between densely spaced, polymorphic genetic markers in humans and other species contains information about historical population size. Inferring past population size is of interest both from an evolutionary perspective (e.g., testing the "out of Africa" hypothesis of human evolution) and to improve models for mapping of disease and quantitative trait genes. We propose a novel multilocus measure of LD, the chromosome segment homozygosity (CSH). CSH is defined for a specific chromosome segment, up to the full length of the chromosome. In computer simulations CSH was generally less variable than the r(2) measure of LD, and variability of CSH decreased as the number of markers in the chromosome segment was increased. The essence and utility of our novel measure is that CSH over long distances reflects recent effective population size (N), whereas CSH over small distances reflects the effective size in the more distant past. We illustrate the utility of CSH by calculating CSH from human and dairy cattle SNP and microsatellite marker data, and predicting N at various times in the past for each species. Results indicated an exponentially increasing N in humans and a declining N in dairy cattle. CSH is a valuable statistic for inferring population histories from haplotype data, and has implications for mapping of disease loci.
DOI: 10.1101/gr.6023607
2007
Cited 397 times
Recent human effective population size estimated from linkage disequilibrium
Effective population size (N(e)) determines the amount of genetic variation, genetic drift, and linkage disequilibrium (LD) in populations. Here, we present the first genome-wide estimates of human effective population size from LD data. Chromosome-specific effective population size was estimated for all autosomes and the X chromosome from estimated LD between SNP pairs <100 kb apart. We account for variation in recombination rate by using coalescent-based estimates of fine-scale recombination rate from one sample and correlating these with LD in an independent sample. Phase I of the HapMap project produced between 18 and 22 million SNP pairs in samples from four populations: Yoruba from Ibadan (YRI), Nigeria; Japanese from Tokyo (JPT); Han Chinese from Beijing (HCB); and residents from Utah with ancestry from northern and western Europe (CEU). For CEU, JPT, and HCB, the estimate of effective population size, adjusted for SNP ascertainment bias, was approximately 3100, whereas the estimate for the YRI was approximately 7500, consistent with the out-of-Africa theory of ancestral human population expansion and concurrent bottlenecks. We show that the decay in LD over distance between SNPs is consistent with recent population growth. The estimates of N(e) are lower than previously published estimates based on heterozygosity, possibly because they represent one or more bottlenecks in human population size that occurred approximately 10,000 to 200,000 years ago.
DOI: 10.1101/447367
2018
Cited 379 times
Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis
Summary While many disease-associated variants have been identified through genome-wide association studies, their downstream molecular consequences remain unclear. To identify these effects, we performed cis- and trans-expression quantitative trait locus (eQTL) analysis in blood from 31,684 individuals through the eQTLGen Consortium. We observed that cis -eQTLs can be detected for 88% of the studied genes, but that they have a different genetic architecture compared to disease-associated variants, limiting our ability to use cis -eQTLs to pinpoint causal genes within susceptibility loci. In contrast, trans-eQTLs (detected for 37% of 10,317 studied trait-associated variants) were more informative. Multiple unlinked variants, associated to the same complex trait, often converged on trans-genes that are known to play central roles in disease etiology. We observed the same when ascertaining the effect of polygenic scores calculated for 1,263 genome-wide association study (GWAS) traits. Expression levels of 13% of the studied genes correlated with polygenic scores, and many resulting genes are known to drive these traits.
DOI: 10.1534/genetics.111.130922
2011
Cited 371 times
Quantification of Inbreeding Due to Distant Ancestors and Its Detection Using Dense Single Nucleotide Polymorphism Data
Inbreeding depression, which refers to reduced fitness among offspring of related parents, has traditionally been studied using pedigrees. In practice, pedigree information is difficult to obtain, potentially unreliable, and rarely assessed for inbreeding arising from common ancestors who lived more than a few generations ago. Recently, there has been excitement about using SNP data to estimate inbreeding (F) arising from distant common ancestors in apparently "outbred" populations. Statistical power to detect inbreeding depression using SNP data depends on the actual variation in inbreeding in a population, the accuracy of detecting that with marker data, the effect size, and the sample size. No one has yet investigated what variation in F is expected in SNP data as a function of population size, and it is unclear which estimate of F is optimal for detecting inbreeding depression. In the present study, we use theory, simulated genetic data, and real genetic data to find the optimal estimate of F, to quantify the likely variation in F in populations of various sizes, and to estimate the power to detect inbreeding depression. We find that F estimated from runs of homozygosity (Froh), which reflects shared ancestry of genetic haplotypes, retains variation in even large populations (e.g., SD=0.5% when Ne=10,000) and is likely to be the most powerful method of detecting inbreeding effects from among several alternative estimates of F. However, large samples (e.g., 12,000-65,000) will be required to detect inbreeding depression for likely effect sizes, and so studies using Froh to date have probably been underpowered.
DOI: 10.1038/ng.3941
2017
Cited 359 times
Concepts, estimation and interpretation of SNP-based heritability
DOI: 10.1371/journal.pgen.1004969
2015
Cited 349 times
Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model
Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.
DOI: 10.1016/s0140-6736(11)60874-x
2011
Cited 334 times
Identification of IL6R and chromosome 11q13.5 as risk loci for asthma
We aimed to identify novel genetic variants affecting asthma risk, since these might provide novel insights into molecular mechanisms underlying the disease.We did a genome-wide association study (GWAS) in 2669 physician-diagnosed asthmatics and 4528 controls from Australia. Seven loci were prioritised for replication after combining our results with those from the GABRIEL consortium (n=26,475), and these were tested in an additional 25,358 independent samples from four in-silico cohorts. Quantitative multi-marker scores of genetic load were constructed on the basis of results from the GABRIEL study and tested for association with asthma in our Australian GWAS dataset.Two loci were confirmed to associate with asthma risk in the replication cohorts and reached genome-wide significance in the combined analysis of all available studies (n=57,800): rs4129267 (OR 1·09, combined p=2·4×10(-8)) in the interleukin-6 receptor (IL6R) gene and rs7130588 (OR 1·09, p=1·8×10(-8)) on chromosome 11q13.5 near the leucine-rich repeat containing 32 gene (LRRC32, also known as GARP). The 11q13.5 locus was significantly associated with atopic status among asthmatics (OR 1·33, p=7×10(-4)), suggesting that it is a risk factor for allergic but not non-allergic asthma. Multi-marker association results are consistent with a highly polygenic contribution to asthma risk, including loci with weak effects that might be shared with other immune-related diseases, such as NDFIP1, HLA-B, LPP, and BACH2.The IL6R association further supports the hypothesis that cytokine signalling dysregulation affects asthma risk, and raises the possibility that an IL6R antagonist (tocilizumab) may be effective to treat the disease, perhaps in a genotype-dependent manner. Results for the 11q13.5 locus suggest that it directly increases the risk of allergic sensitisation which, in turn, increases the risk of subsequent development of asthma. Larger or more functionally focused studies are needed to characterise the many loci with modest effects that remain to be identified for asthma.National Health and Medical Research Council of Australia. A full list of funding sources is provided in the webappendix.
DOI: 10.1038/nn.3708
2014
Cited 324 times
Large-scale genomics unveils the genetic architecture of psychiatric disorders
Family study results are consistent with genetic effects making substantial contributions to risk of psychiatric disorders such as schizophrenia, yet robust identification of specific genetic variants that explain variation in population risk had been disappointing until the advent of technologies that assay the entire genome in large samples. We highlight recent progress that has led to a better understanding of the number of risk variants in the population and the interaction of allele frequency and effect size. The emerging genetic architecture implies a large number of contributing loci (that is, a high genome-wide mutational target) and suggests that genetic risk of psychiatric disorders involves the combined effects of many common variants of small effect, as well as rare and de novo variants of large effect. The capture of a substantial proportion of genetic risk facilitates new study designs to investigate the combined effects of genes and the environment.
DOI: 10.1371/journal.pgen.1004269
2014
Cited 321 times
Statistical Power to Detect Genetic (Co)Variance of Complex Traits Using SNP Data in Unrelated Samples
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.
DOI: 10.1038/s41588-018-0101-4
2018
Cited 320 times
Signatures of negative selection in the genetic architecture of human complex traits
We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits. BayesS estimates SNP-based heritability, polygenicity, and the relationship between effect size and minor allele frequency using genome-wide SNP data. Applying BayesS to UK Biobank data identifies signatures of natural selection for 23 complex traits.
DOI: 10.1038/s41467-019-12653-0
2019
Cited 310 times
Improved polygenic prediction by Bayesian multiple regression on summary statistics
Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis ( n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R 2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
DOI: 10.1038/nrg2865
2010
Cited 306 times
Reconciling the analysis of IBD and IBS in complex trait studies
DOI: 10.1371/journal.pgen.1000864
2010
Cited 302 times
The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator.
DOI: 10.1093/hmg/ddp295
2009
Cited 302 times
Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk
The current paradigm within genetic diagnostics is to test individuals only at loci known to affect risk of complex disease-yet the technology exists to genotype an individual at thousands of loci across the genome. We investigated whether information from genome-wide association studies could be harnessed to improve discrimination of complex disease affection status. We employed genome-wide data from the Wellcome Trust Case Control Consortium to test this hypothesis. Each disease cohort together with the same set of controls were split into two samples-a 'Training Set', where thousands of SNPs that might predispose to disease risk were identified and a 'Prediction Set', where the discriminatory ability of these SNPs was assessed. Genome-wide scores consisting of, for example, the total number of risk alleles an individual carries was calculated for each individual in the prediction set. Case-control status was regressed on this score and the area under the receiver operator characteristic curve (AUC) estimated. In most cases, a liberal inclusion of SNPs in the genome-wide score improved AUC compared with a more stringent selection of top SNPs, but did not perform as well as selection based upon established variants. The addition of genome-wide scores to known variant information produced only a limited increase in discriminative accuracy but was most effective for bipolar disorder, coronary heart disease and type II diabetes. We conclude that this small increase in discriminative accuracy is unlikely to be of diagnostic or predictive utility at the present time.
DOI: 10.1038/s41588-019-0530-8
2019
Cited 302 times
A resource-efficient tool for mixed model association analysis of large-scale data
The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB. fastGWA is a mixed linear model–based approach for performing genome-wide association analyses at biobank scale, while controlling for population stratification and relatedness.
DOI: 10.1038/s41467-018-04558-1
2018
Cited 292 times
Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood
Understanding the difference in genetic regulation of gene expression between brain and blood is important for discovering genes for brain-related traits and disorders. Here, we estimate the correlation of genetic effects at the top-associated cis-expression or -DNA methylation (DNAm) quantitative trait loci (cis-eQTLs or cis-mQTLs) between brain and blood (r b ). Using publicly available data, we find that genetic effects at the top cis-eQTLs or mQTLs are highly correlated between independent brain and blood samples ([Formula: see text] for cis-eQTLs and [Formula: see text] for cis-mQTLs). Using meta-analyzed brain cis-eQTL/mQTL data (n = 526 to 1194), we identify 61 genes and 167 DNAm sites associated with four brain-related phenotypes, most of which are a subset of the discoveries (97 genes and 295 DNAm sites) using data from blood with larger sample sizes (n = 1980 to 14,115). Our results demonstrate the gain of power in gene discovery for brain-related phenotypes using blood cis-eQTL/mQTL data with large sample sizes.
DOI: 10.1002/gepi.21614
2012
Cited 270 times
A Better Coefficient of Determination for Genetic Profile Analysis
Genome‐wide association studies have facilitated the construction of risk predictors for disease from multiple Single Nucleotide Polymorphism markers. The ability of such “genetic profiles” to predict outcome is usually quantified in an independent data set. Coefficients of determination ( R 2 ) have been a useful measure to quantify the goodness‐of‐fit of the genetic profile. Various pseudo‐ R 2 measures for binary responses have been proposed. However, there is no standard or consensus measure because the concept of residual variance is not easily defined on the observed probability scale. Unlike other nongenetic predictors such as environmental exposure, there is prior information on genetic predictors because for most traits there are estimates of the proportion of variation in risk in the population due to all genetic factors, the heritability. It is this useful ability to benchmark that makes the choice of a measure of goodness‐of‐fit in genetic profiling different from that of nongenetic predictors. In this study, we use a liability threshold model to establish the relationship between the observed probability scale and underlying liability scale in measuring R 2 for binary responses. We show that currently used R 2 measures are difficult to interpret, biased by ascertainment, and not comparable to heritability. We suggest a novel and globally standard measure of R 2 that is interpretable on the liability scale. Furthermore, even when using ascertained case‐control studies that are typical in human disease studies, we can obtain an R 2 measure on the liability scale that can be compared directly to heritability.
DOI: 10.1038/mp.2016.192
2016
Cited 268 times
A DNA methylation biomarker of alcohol consumption
The lack of reliable measures of alcohol intake is a major obstacle to the diagnosis and treatment of alcohol-related diseases. Epigenetic modifications such as DNA methylation may provide novel biomarkers of alcohol use. To examine this possibility, we performed an epigenome-wide association study of methylation of cytosine-phosphate-guanine dinucleotide (CpG) sites in relation to alcohol intake in 13 population-based cohorts (ntotal=13 317; 54% women; mean age across cohorts 42-76 years) using whole blood (9643 European and 2423 African ancestries) or monocyte-derived DNA (588 European, 263 African and 400 Hispanic ancestry) samples. We performed meta-analysis and variable selection in whole-blood samples of people of European ancestry (n=6926) and identified 144 CpGs that provided substantial discrimination (area under the curve=0.90-0.99) for current heavy alcohol intake (⩾42 g per day in men and ⩾28 g per day in women) in four replication cohorts. The ancestry-stratified meta-analysis in whole blood identified 328 (9643 European ancestry samples) and 165 (2423 African ancestry samples) alcohol-related CpGs at Bonferroni-adjusted P<1 × 10-7. Analysis of the monocyte-derived DNA (n=1251) identified 62 alcohol-related CpGs at P<1 × 10-7. In whole-blood samples of people of European ancestry, we detected differential methylation in two neurotransmitter receptor genes, the γ-Aminobutyric acid-A receptor delta and γ-aminobutyric acid B receptor subunit 1; their differential methylation was associated with expression levels of a number of genes involved in immune function. In conclusion, we have identified a robust alcohol-related DNA methylation signature and shown the potential utility of DNA methylation as a clinically useful diagnostic test to detect current heavy alcohol consumption.
DOI: 10.1038/ng.731
2010
Cited 267 times
Genome-wide association study identifies a locus at 7p15.2 associated with endometriosis
Krina Zondervan and colleagues report a genome-wide association study for endometriosis. The authors identify a susceptibility locus on chromosome 7p15. Endometriosis is a common gynecological disease associated with pelvic pain and subfertility. We conducted a genome-wide association study (GWAS) in 3,194 individuals with surgically confirmed endometriosis (cases) and 7,060 controls from Australia and the UK. Polygenic predictive modeling showed significantly increased genetic loading among 1,364 cases with moderate to severe endometriosis. The strongest association signal was on 7p15.2 (rs12700667) for 'all' endometriosis (P = 2.6 × 10−7, odds ratio (OR) = 1.22, 95% CI 1.13–1.32) and for moderate to severe disease (P = 1.5 × 10−9, OR = 1.38, 95% CI 1.24–1.53). We replicated rs12700667 in an independent cohort from the United States of 2,392 self-reported, surgically confirmed endometriosis cases and 2,271 controls (P = 1.2 × 10−3, OR = 1.17, 95% CI 1.06–1.28), resulting in a genome-wide significant P value of 1.4 × 10−9 (OR = 1.20, 95% CI 1.13–1.27) for 'all' endometriosis in our combined datasets of 5,586 cases and 9,331 controls. rs12700667 is located in an intergenic region upstream of the plausible candidate genes NFE2L3 and HOXA10.
DOI: 10.1002/mds.27659
2019
Cited 258 times
Parkinson's disease age at onset genome‐wide association study: Defining heritability, genetic loci, and α‐synuclein mechanisms
Increasing evidence supports an extensive and complex genetic contribution to PD. Previous genome-wide association studies (GWAS) have shed light on the genetic basis of risk for this disease. However, the genetic determinants of PD age at onset are largely unknown.To identify the genetic determinants of PD age at onset.Using genetic data of 28,568 PD cases, we performed a genome-wide association study based on PD age at onset.We estimated that the heritability of PD age at onset attributed to common genetic variation was ∼0.11, lower than the overall heritability of risk for PD (∼0.27), likely, in part, because of the subjective nature of this measure. We found two genome-wide significant association signals, one at SNCA and the other a protein-coding variant in TMEM175, both of which are known PD risk loci and a Bonferroni-corrected significant effect at other known PD risk loci, GBA, INPP5F/BAG3, FAM47E/SCARB2, and MCCC1. Notably, SNCA, TMEM175, SCARB2, BAG3, and GBA have all been shown to be implicated in α-synuclein aggregation pathways. Remarkably, other well-established PD risk loci, such as GCH1 and MAPT, did not show a significant effect on age at onset of PD.Overall, we have performed the largest age at onset of PD genome-wide association studies to date, and our results show that not all PD risk loci influence age at onset with significant differences between risk alleles for age at onset. This provides a compelling picture, both within the context of functional characterization of disease-linked genetic variability and in defining differences between risk alleles for age at onset, or frank risk for disease. © 2019 International Parkinson and Movement Disorder Society.
DOI: 10.1371/journal.pmed.1002215
2017
Cited 253 times
Association of Body Mass Index with DNA Methylation and Gene Expression in Blood Cells and Relations to Cardiometabolic Disease: A Mendelian Randomization Approach
The link between DNA methylation, obesity, and adiposity-related diseases in the general population remains uncertain.We conducted an association study of body mass index (BMI) and differential methylation for over 400,000 CpGs assayed by microarray in whole-blood-derived DNA from 3,743 participants in the Framingham Heart Study and the Lothian Birth Cohorts, with independent replication in three external cohorts of 4,055 participants. We examined variations in whole blood gene expression and conducted Mendelian randomization analyses to investigate the functional and clinical relevance of the findings. We identified novel and previously reported BMI-related differential methylation at 83 CpGs that replicated across cohorts; BMI-related differential methylation was associated with concurrent changes in the expression of genes in lipid metabolism pathways. Genetic instrumental variable analysis of alterations in methylation at one of the 83 replicated CpGs, cg11024682 (intronic to sterol regulatory element binding transcription factor 1 [SREBF1]), demonstrated links to BMI, adiposity-related traits, and coronary artery disease. Independent genetic instruments for expression of SREBF1 supported the findings linking methylation to adiposity and cardiometabolic disease. Methylation at a substantial proportion (16 of 83) of the identified loci was found to be secondary to differences in BMI. However, the cross-sectional nature of the data limits definitive causal determination.We present robust associations of BMI with differential DNA methylation at numerous loci in blood cells. BMI-related DNA methylation and gene expression provide mechanistic insights into the relationship between DNA methylation, obesity, and adiposity-related diseases.
DOI: 10.1038/s41467-018-03371-0
2018
Cited 251 times
Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits
Abstract The identification of genes and regulatory elements underlying the associations discovered by GWAS is essential to understanding the aetiology of complex traits (including diseases). Here, we demonstrate an analytical paradigm of prioritizing genes and regulatory elements at GWAS loci for follow-up functional studies. We perform an integrative analysis that uses summary-level SNP data from multi-omics studies to detect DNA methylation (DNAm) sites associated with gene expression and phenotype through shared genetic effects (i.e., pleiotropy). We identify pleiotropic associations between 7858 DNAm sites and 2733 genes. These DNAm sites are enriched in enhancers and promoters, and &gt;40% of them are mapped to distal genes. Further pleiotropic association analyses, which link both the methylome and transcriptome to 12 complex traits, identify 149 DNAm sites and 66 genes, indicating a plausible mechanism whereby the effect of a genetic variant on phenotype is mediated by genetic regulation of transcription through DNAm.
DOI: 10.1073/pnas.1404623111
2014
Cited 250 times
Common genetic variants associated with cognitive performance identified using the proxy-phenotype method
We identify common genetic variants associated with cognitive performance using a two-stage approach, which we call the proxy-phenotype method. First, we conduct a genome-wide association study of educational attainment in a large sample (n = 106,736), which produces a set of 69 education-associated SNPs. Second, using independent samples (n = 24,189), we measure the association of these education-associated SNPs with cognitive performance. Three SNPs (rs1487441, rs7923609, and rs2721173) are significantly associated with cognitive performance after correction for multiple hypothesis testing. In an independent sample of older Americans (n = 8,652), we also show that a polygenic score derived from the education-associated SNPs is associated with memory and absence of dementia. Convergent evidence from a set of bioinformatics analyses implicates four specific genes (KNCMA1, NRXN1, POU2F3, and SCRT). All of these genes are associated with a particular neurotransmitter pathway involved in synaptic plasticity, the main cellular mechanism for learning and memory.
DOI: 10.1101/gr.136598.111
2012
Cited 249 times
Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence
Comparison between groups of monozygotic (MZ) and dizygotic (DZ) twins enables an estimation of the relative contribution of genetic and shared and nonshared environmental factors to phenotypic variability. Using DNA methylation profiling of ∼20,000 CpG sites as a phenotype, we have examined discordance levels in three neonatal tissues from 22 MZ and 12 DZ twin pairs. MZ twins exhibit a wide range of within-pair differences at birth, but show discordance levels generally lower than DZ pairs. Within-pair methylation discordance was lowest in CpG islands in all twins and increased as a function of distance from islands. Variance component decomposition analysis of DNA methylation in MZ and DZ pairs revealed a low mean heritability across all tissues, although a wide range of heritabilities was detected for specific genomic CpG sites. The largest component of variation was attributed to the combined effects of nonshared intrauterine environment and stochastic factors. Regression analysis of methylation on birth weight revealed a general association between methylation of genes involved in metabolism and biosynthesis, providing further support for epigenetic change in the previously described link between low birth weight and increasing risk for cardiovascular, metabolic, and other complex diseases. Finally, comparison of our data with that of several older twins revealed little evidence for genome-wide epigenetic drift with increasing age. This is the first study to analyze DNA methylation on a genome scale in twins at birth, further highlighting the importance of the intrauterine environment on shaping the neonatal epigenome.
DOI: 10.1186/s13059-016-1119-5
2016
Cited 245 times
DNA methylation signatures of chronic low-grade inflammation are associated with complex diseases
Chronic low-grade inflammation reflects a subclinical immune response implicated in the pathogenesis of complex diseases. Identifying genetic loci where DNA methylation is associated with chronic low-grade inflammation may reveal novel pathways or therapeutic targets for inflammation. We performed a meta-analysis of epigenome-wide association studies (EWAS) of serum C-reactive protein (CRP), which is a sensitive marker of low-grade inflammation, in a large European population (n = 8863) and trans-ethnic replication in African Americans (n = 4111). We found differential methylation at 218 CpG sites to be associated with CRP (P < 1.15 × 10–7) in the discovery panel of European ancestry and replicated (P < 2.29 × 10–4) 58 CpG sites (45 unique loci) among African Americans. To further characterize the molecular and clinical relevance of the findings, we examined the association with gene expression, genetic sequence variants, and clinical outcomes. DNA methylation at nine (16%) CpG sites was associated with whole blood gene expression in cis (P < 8.47 × 10–5), ten (17%) CpG sites were associated with a nearby genetic variant (P < 2.50 × 10–3), and 51 (88%) were also associated with at least one related cardiometabolic entity (P < 9.58 × 10–5). An additive weighted score of replicated CpG sites accounted for up to 6% inter-individual variation (R2) of age-adjusted and sex-adjusted CRP, independent of known CRP-related genetic variants. We have completed an EWAS of chronic low-grade inflammation and identified many novel genetic loci underlying inflammation that may serve as targets for the development of novel therapeutic interventions for inflammation.
DOI: 10.1016/j.cell.2018.05.051
2018
Cited 242 times
Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model
The evidence that most adult-onset common diseases have a polygenic genetic architecture fully consistent with robust biological systems supported by multiple back-up mechanisms is now overwhelming. In this context, we consider the recent "omnigenic" or "core genes" model. A key assumption of the model is that there is a relatively small number of core genes relevant to any disease. While intuitively appealing, this model may underestimate the biological complexity of common disease, and therefore, the goal to discover core genes should not guide experimental design. We consider other implications of polygenicity, concluding that a focus on patient stratification is needed to achieve the goals of precision medicine.
DOI: 10.1038/ng.3401
2015
Cited 240 times
Population genetic differentiation of height and body mass index across Europe
Matthew Robinson and colleagues report an analysis of population genetic differences in human height and body mass index (BMI) across 14 European populations. They estimate the proportion of additive genetic variance attributable to population genetic differences and find evidence for selection increasing height while reducing BMI in European nations. Across-nation differences in the mean values for complex traits are common1,2,3,4,5,6,7,8, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).
DOI: 10.1186/gb-2014-15-5-r73
2014
Cited 239 times
Contribution of genetic variation to transgenerational inheritance of DNA methylation
Despite the important role DNA methylation plays in transcriptional regulation, the transgenerational inheritance of DNA methylation is not well understood. The genetic heritability of DNA methylation has been estimated using twin pairs, although concern has been expressed whether the underlying assumption of equal common environmental effects are applicable due to intrauterine differences between monozygotic and dizygotic twins. We estimate the heritability of DNA methylation on peripheral blood leukocytes using Illumina HumanMethylation450 array using a family based sample of 614 people from 117 families, allowing comparison both within and across generations. The correlations from the various available relative pairs indicate that on average the similarity in DNA methylation between relatives is predominantly due to genetic effects with any common environmental or zygotic effects being limited. The average heritability of DNA methylation measured at probes with no known SNPs is estimated as 0.187. The ten most heritable methylation probes were investigated with a genome-wide association study, all showing highly statistically significant cis mQTLs. Further investigation of one of these cis mQTL, found in the MHC region of chromosome 6, showed the most significantly associated SNP was also associated with over 200 other DNA methylation probes in this region and the gene expression level of 9 genes. The majority of transgenerational similarity in DNA methylation is attributable to genetic effects, and approximately 20% of individual differences in DNA methylation in the population are caused by DNA sequence variation that is not located within CpG sites.
DOI: 10.1073/pnas.1120666109
2012
Cited 237 times
The genetic architecture of economic and political preferences
Preferences are fundamental building blocks in all models of economic and political behavior. We study a new sample of comprehensively genotyped subjects with data on economic and political preferences and educational attainment. We use dense single nucleotide polymorphism (SNP) data to estimate the proportion of variation in these traits explained by common SNPs and to conduct genome-wide association study (GWAS) and prediction analyses. The pattern of results is consistent with findings for other complex traits. First, the estimated fraction of phenotypic variation that could, in principle, be explained by dense SNP arrays is around one-half of the narrow heritability estimated using twin and family samples. The molecular-genetic-based heritability estimates, therefore, partially corroborate evidence of significant heritability from behavior genetic studies. Second, our analyses suggest that these traits have a polygenic architecture, with the heritable variation explained by many genes with small effects. Our results suggest that most published genetic association studies with economic and political traits are dramatically underpowered, which implies a high false discovery rate. These results convey a cautionary message for whether, how, and how soon molecular genetic data can contribute to, and potentially transform, research in social science. We propose some constructive responses to the inferential challenges posed by the small explanatory power of individual SNPs.
DOI: 10.1038/mp.2012.184
2013
Cited 237 times
Childhood intelligence is heritable, highly polygenic and associated with FNBP1L
Intelligence in childhood, as measured by psychometric cognitive tests, is a strong predictor of many important life outcomes, including educational attainment, income, health and lifespan. Results from twin, family and adoption studies are consistent with general intelligence being highly heritable and genetically stable throughout the life course. No robustly associated genetic loci or variants for childhood intelligence have been reported. Here, we report the first genome-wide association study (GWAS) on childhood intelligence (age range 6–18 years) from 17 989 individuals in six discovery and three replication samples. Although no individual single-nucleotide polymorphisms (SNPs) were detected with genome-wide significance, we show that the aggregate effects of common SNPs explain 22–46% of phenotypic variation in childhood intelligence in the three largest cohorts (P=3.9 × 10−15, 0.014 and 0.028). FNBP1L, previously reported to be the most significantly associated gene for adult intelligence, was also significantly associated with childhood intelligence (P=0.003). Polygenic prediction analyses resulted in a significant correlation between predictor and outcome in all replication cohorts. The proportion of childhood intelligence explained by the predictor reached 1.2% (P=6 × 10−5), 3.5% (P=10−3) and 0.5% (P=6 × 10−5) in three independent validation cohorts. Given the sample sizes, these genetic prediction results are consistent with expectations if the genetic architecture of childhood intelligence is like that of body mass index or height. Our study provides molecular support for the heritability and polygenic nature of childhood intelligence. Larger sample sizes will be required to detect individual variants with genome-wide significance.
DOI: 10.1038/ng.456
2009
Cited 235 times
Common variants in TMPRSS6 are associated with iron status and erythrocyte volume
We report a genome-wide association study to iron status. We identify an association of SNPs in TPMRSS6 to serum iron (rs855791, combined P = 1.5 x 10(-20)), transferrin saturation (combined P = 2.2 x 10(-23)) and erythrocyte mean cell volume (MCV, combined P = 1.1 x 10(-10)). We also find suggestive evidence of association with blood hemoglobin levels (combined P = 5.3 x 10(-7)). These findings demonstrate the involvement of TMPRSS6 in control of iron homeostasis and in normal erythropoiesis.
DOI: 10.1038/nature10781
2012
Cited 232 times
Genetic contributions to stability and change in intelligence from childhood to old age
Understanding the determinants of healthy mental ageing is a priority for society today. So far, we know that intelligence differences show high stability from childhood to old age and there are estimates of the genetic contribution to intelligence at different ages. However, attempts to discover whether genetic causes contribute to differences in cognitive ageing have been relatively uninformative. Here we provide an estimate of the genetic and environmental contributions to stability and change in intelligence across most of the human lifetime. We used genome-wide single nucleotide polymorphism (SNP) data from 1,940 unrelated individuals whose intelligence was measured in childhood (age 11 years) and again in old age (age 65, 70 or 79 years). We use a statistical method that allows genetic (co)variance to be estimated from SNP data on unrelated individuals. We estimate that causal genetic variants in linkage disequilibrium with common SNPs account for 0.24 of the variation in cognitive ability change from childhood to old age. Using bivariate analysis, we estimate a genetic correlation between intelligence at age 11 years and in old age of 0.62. These estimates, derived from rarely available data on lifetime cognitive measures, warrant the search for genetic causes of cognitive stability and change.
DOI: 10.1093/ije/dyw041
2016
Cited 221 times
The epigenetic clock and telomere length are independently associated with chronological age and mortality
BackgroundTelomere length and DNA methylation have been proposed as biological clock measures that track chronological age. Whether they change in tandem, or contribute independently to the prediction of chronological age, is not known.
DOI: 10.1038/s41467-020-15421-7
2020
Cited 221 times
Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration
Abstract Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a genome-wide association study of 25 hydroxyvitamin D (25OHD) concentration in 417,580 Europeans we identify 143 independent loci in 112 1-Mb regions, providing insights into the physiology of vitamin D and implicating genes involved in lipid and lipoprotein metabolism, dermal tissue properties, and the sulphonation and glucuronidation of 25OHD. Mendelian randomization models find no robust evidence that 25OHD concentration has causal effects on candidate phenotypes (e.g. BMI, psychiatric disorders), but many phenotypes have (direct or indirect) causal effects on 25OHD concentration, clarifying the epidemiological relationship between 25OHD status and the health outcomes examined in this study.
DOI: 10.1038/s41588-018-0108-x
2018
Cited 214 times
Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits
Multiple methods have been developed to estimate narrow-sense heritability, h2, using single nucleotide polymorphisms (SNPs) in unrelated individuals. However, a comprehensive evaluation of these methods has not yet been performed, leading to confusion and discrepancy in the literature. We present the most thorough and realistic comparison of these methods to date. We used thousands of real whole-genome sequences to simulate phenotypes under varying genetic architectures and confounding variables, and we used array, imputed, or whole genome sequence SNPs to obtain ‘SNP-heritability’ estimates. We show that SNP-heritability can be highly sensitive to assumptions about the frequencies, effect sizes, and levels of linkage disequilibrium of underlying causal variants, but that methods that bin SNPs according to minor allele frequency and linkage disequilibrium are less sensitive to these assumptions across a wide range of genetic architectures and possible confounding factors. These findings provide guidance for best practices and proper interpretation of published estimates. This analysis compares methods for estimating the heritability and genetic architecture of complex traits using whole-genome data. The results provide guidance for best practices and proper interpretation of published heritability estimates.
DOI: 10.1038/ncomms5926
2014
Cited 205 times
Novel loci affecting iron homeostasis and their effects in individuals at risk for hemochromatosis
Variation in body iron is associated with or causes diseases, including anaemia and iron overload. Here, we analyse genetic association data on biochemical markers of iron status from 11 European-population studies, with replication in eight additional cohorts (total up to 48,972 subjects). We find 11 genome-wide-significant (P<5 × 10(-8)) loci, some including known iron-related genes (HFE, SLC40A1, TF, TFR2, TFRC, TMPRSS6) and others novel (ABO, ARNTL, FADS2, NAT2, TEX14). SNPs at ARNTL, TF, and TFR2 affect iron markers in HFE C282Y homozygotes at risk for hemochromatosis. There is substantial overlap between our iron loci and loci affecting erythrocyte and lipid phenotypes. These results will facilitate investigation of the roles of iron in disease.
DOI: 10.1007/978-1-62703-447-0_9
2013
Cited 202 times
Genome-Wide Complex Trait Analysis (GCTA): Methods, Data Analyses, and Interpretations
Estimating genetic variance is traditionally performed using pedigree analysis. Using high-throughput DNA marker data measured across the entire genome it is now possible to estimate and partition genetic variation from population samples. In this chapter, we introduce methods and a software tool called Genome-wide Complex Trait Analysis (GCTA) to estimate genomic relationships between pairs of conventionally unrelated individuals using genome-wide single nucleotide polymorphism (SNP) data, to estimate variance explained by all SNPs simultaneously on genomic or chromosomal segments or over the whole genome, and to perform a joint and conditional multiple SNPs association analysis using summary statistics from a meta-analysis of genome-wide association studies and linkage disequilibrium between SNPs estimated from a reference sample.
DOI: 10.1016/j.ajhg.2015.01.001
2015
Cited 202 times
Dominance Genetic Variation Contributes Little to the Missing Heritability for Human Complex Traits
For human complex traits, non-additive genetic variation has been invoked to explain "missing heritability," but its discovery is often neglected in genome-wide association studies. Here we propose a method of using SNP data to partition and estimate the proportion of phenotypic variance attributed to additive and dominance genetic variation at all SNPs (hSNP(2) and δSNP(2)) in unrelated individuals based on an orthogonal model where the estimate of hSNP(2) is independent of that of δSNP(2). With this method, we analyzed 79 quantitative traits in 6,715 unrelated European Americans. The estimate of δSNP(2) averaged across all the 79 quantitative traits was 0.03, approximately a fifth of that for additive variation (average hSNP(2) = 0.15). There were a few traits that showed substantial estimates of δSNP(2), none of which were replicated in a larger sample of 11,965 individuals. We further performed genome-wide association analyses of the 79 quantitative traits and detected SNPs with genome-wide significant dominance effects only at the ABO locus for factor VIII and von Willebrand factor. All these results suggest that dominance variation at common SNPs explains only a small fraction of phenotypic variation for human complex traits and contributes little to the missing narrow-sense heritability problem.
DOI: 10.1001/jamapsychiatry.2020.3049
2021
Cited 202 times
From Basic Science to Clinical Application of Polygenic Risk Scores
Polygenic risk scores (PRS) are predictors of the genetic susceptibilities of individuals to diseases. All individuals have DNA risk variants for all common diseases, but genetic susceptibility differences between people reflect the cumulative burden of these. Polygenic risk scores for an individual are calculated as weighted counts of thousands of risk variants that they carry, where the risk variants and their weights have been identified in genome-wide association studies. Here, we review the underlying basic science of PRS, providing a foundation for understanding the potential clinical utility and limitations of PRS.Polygenic risk scores can be calculated for a wide range of diseases from a saliva or blood sample using genotyping technologies that are inexpensive. While genotyping only needs to be done once for each individual in their lifetime, the PRS can be recalculated as identification of risk variants improves. On their own, PRS will never be able to establish or definitively predict future diagnoses of common complex conditions because genetic factors only contribute part of the risk, and PRS will only ever capture part of the genetic contributions. Nonetheless, just as clinical medicine uses a multitude of other predictive measures, PRS either on their own or as part of multivariable predictive algorithms could play a role.Utility of PRS in clinical medicine and ethical issues related to their use should be evaluated in the context of realistic expectations of what PRS can and cannot deliver. For different diseases, PRS could have utility in community settings (stratification to better triage people into established screening programs) or could contribute to clinical decision-making for those presenting with symptoms but where formal diagnosis is unclear. In principle, PRS could contribute to treatment choices, but more data are needed to allow development of PRS in this context.
DOI: 10.1186/s13073-019-0667-1
2019
Cited 199 times
Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing
DNA methylation changes with age. Chronological age predictors built from DNA methylation are termed ‘epigenetic clocks’. The deviation of predicted age from the actual age (‘age acceleration residual’, AAR) has been reported to be associated with death. However, it is currently unclear how a better prediction of chronological age affects such association. In this study, we build multiple predictors based on training DNA methylation samples selected from 13,661 samples (13,402 from blood and 259 from saliva). We use the Lothian Birth Cohorts of 1921 (LBC1921) and 1936 (LBC1936) to examine whether the association between AAR (from these predictors) and death is affected by (1) improving prediction accuracy of an age predictor as its training sample size increases (from 335 to 12,710) and (2) additionally correcting for confounders (i.e., cellular compositions). In addition, we investigated the performance of our predictor in non-blood tissues. We found that in principle, a near-perfect age predictor could be developed when the training sample size is sufficiently large. The association between AAR and mortality attenuates as prediction accuracy increases. AAR from our best predictor (based on Elastic Net, https://github.com/qzhang314/DNAm-based-age-predictor ) exhibits no association with mortality in both LBC1921 (hazard ratio = 1.08, 95% CI 0.91–1.27) and LBC1936 (hazard ratio = 1.00, 95% CI 0.79–1.28). Predictors based on small sample size are prone to confounding by cellular compositions relative to those from large sample size. We observed comparable performance of our predictor in non-blood tissues with a multi-tissue-based predictor. This study indicates that the epigenetic clock can be improved by increasing the training sample size and that its association with mortality attenuates with increased prediction of chronological age.
DOI: 10.1016/j.cell.2021.10.015
2021
Cited 191 times
Autism-related dietary preferences mediate autism-gut microbiome associations
There is increasing interest in the potential contribution of the gut microbiome to autism spectrum disorder (ASD). However, previous studies have been underpowered and have not been designed to address potential confounding factors in a comprehensive way. We performed a large autism stool metagenomics study (n = 247) based on participants from the Australian Autism Biobank and the Queensland Twin Adolescent Brain project. We found negligible direct associations between ASD diagnosis and the gut microbiome. Instead, our data support a model whereby ASD-related restricted interests are associated with less-diverse diet, and in turn reduced microbial taxonomic diversity and looser stool consistency. In contrast to ASD diagnosis, our dataset was well powered to detect microbiome associations with traits such as age, dietary intake, and stool consistency. Overall, microbiome differences in ASD may reflect dietary preferences that relate to diagnostic features, and we caution against claims that the microbiome has a driving role in ASD.
DOI: 10.1038/nature13005
2014
Cited 190 times
RETRACTED ARTICLE: Detection and replication of epistasis influencing transcription in humans
Epistasis has rarely been shown among natural polymorphisms in human traits; this research using advanced computation and gene expression data reveals many instances of epistasis between common single nucleotide polymorphisms in humans, with epistasis and the direction of its effect replicating in independent cohorts. Although frequently demonstrated in model organisms and domesticated species, very few examples of epistasis — where the effect of one polymorphism on a trait depends on other polymorphisms present in the genome — have been demonstrated in humans. Using advanced computation and a gene expression study design, these authors reveal hundreds of instances of epistasis between common single nucleotide polymorphisms (SNPs) in humans. They show that epistasis and direction of its effect replicate in independent cohorts. Epistatic networks of three or more SNPs are shown to influence the expression levels of many genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. Epistasis is the phenomenon whereby one polymorphism’s effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits1 and contributes to their variation2,3 is a fundamental question in evolution and human genetics. Although often demonstrated in artificial gene manipulation studies in model organisms4,5, and some examples have been reported in other species6, few examples exist for epistasis among natural polymorphisms in human traits7,8. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits2,3, but an alternative view is that it has previously been too technically challenging to detect owing to statistical and computational issues9. Here we show, using advanced computation10 and a gene expression study design, that many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7,339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (P < 2.91 × 10−16). Replication of these interactions in two independent data sets11,12 showed both concordance of direction of epistatic effects (P = 5.56 × 10−31) and enrichment of interaction P values, with 30 being significant at a conservative threshold of P < 9.98 × 10−5. Forty-four of the genetic interactions are located within 5 megabases of regions of known physical chromosome interactions13 (P = 1.8 × 10−10). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example, MBNL1 is influenced by an additive effect at rs13069559, which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype–phenotype maps for each cis–trans interaction. This study presents the first evidence, to our knowledge, for many instances of segregating common polymorphisms interacting to influence human traits.
DOI: 10.1038/s41562-019-0757-5
2019
Cited 184 times
Genetic correlates of social stratification in Great Britain
Human DNA polymorphisms vary across geographic regions, with the most commonly observed variation reflecting distant ancestry differences. Here we investigate the geographic clustering of common genetic variants that influence complex traits in a sample of ~450,000 individuals from Great Britain. Of 33 traits analysed, 21 showed significant geographic clustering at the genetic level after controlling for ancestry, probably reflecting migration driven by socioeconomic status (SES). Alleles associated with educational attainment (EA) showed the most clustering, with EA-decreasing alleles clustering in lower SES areas such as coal mining areas. Individuals who leave coal mining areas carry more EA-increasing alleles on average than those in the rest of Great Britain. The level of geographic clustering is correlated with genetic associations between complex traits and regional measures of SES, health and cultural outcomes. Our results are consistent with the hypothesis that social stratification leaves visible marks in geographic arrangements of common allele frequencies and gene-environment correlations.
DOI: 10.1093/hmg/dds491
2012
Cited 183 times
Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis
Common diseases such as endometriosis (ED), Alzheimer's disease (AD) and multiple sclerosis (MS) account for a significant proportion of the health care burden in many countries. Genome-wide association studies (GWASs) for these diseases have identified a number of individual genetic variants contributing to the risk of those diseases. However, the effect size for most variants is small and collectively the known variants explain only a small proportion of the estimated heritability. We used a linear mixed model to fit all single nucleotide polymorphisms (SNPs) simultaneously, and estimated genetic variances on the liability scale using SNPs from GWASs in unrelated individuals for these three diseases. For each of the three diseases, case and control samples were not all genotyped in the same laboratory. We demonstrate that a careful analysis can obtain robust estimates, but also that insufficient quality control (QC) of SNPs can lead to spurious results and that too stringent QC is likely to remove real genetic signals. Our estimates show that common SNPs on commercially available genotyping chips capture significant variation contributing to liability for all three diseases. The estimated proportion of total variation tagged by all SNPs was 0.26 (SE 0.04) for ED, 0.24 (SE 0.03) for AD and 0.30 (SE 0.03) for MS. Further, we partitioned the genetic variance explained into five categories by a minor allele frequency (MAF), by chromosomes and gene annotation. We provide strong evidence that a substantial proportion of variation in liability is explained by common SNPs, and thereby give insights into the genetic architecture of the diseases.
DOI: 10.1016/j.ajhg.2016.12.008
2017
Cited 180 times
The Genetic Architecture of Gene Expression in Peripheral Blood
We analyzed the mRNA levels for 36,778 transcript expression traits (probes) from 2,765 individuals to comprehensively investigate the genetic architecture and degree of missing heritability for gene expression in peripheral blood. We identified 11,204 cis and 3,791 trans independent expression quantitative trait loci (eQTL) by using linear mixed models to perform genome-wide association analyses. Furthermore, using information on both closely and distantly related individuals, heritability was estimated for all expression traits. Of the set of expressed probes (15,966), 10,580 (66%) had an estimated narrow-sense heritability (h2) greater than zero with a mean (median) value of 0.192 (0.142). Across these probes, on average the proportion of genetic variance explained by all eQTL (hCOJO2) was 31% (0.060/0.192), meaning that 69% is missing, with the sentinel SNP of the largest eQTL explaining 87% (0.052/0.060) of the variance attributed to all identified cis- and trans-eQTL. For the same set of probes, the genetic variance attributed to genome-wide common (MAF > 0.01) HapMap 3 SNPs (hg2) accounted for on average 48% (0.093/0.192) of h2. Taken together, the evidence suggests that approximately half the genetic variance for gene expression is not tagged by common SNPs, and of the variance that is tagged by common SNPs, a large proportion can be attributed to identifiable eQTL of large effect, typically in cis. Finally, we present evidence that, compared with a meta-analysis, using individual-level data results in an increase of approximately 50% in power to detect eQTL.
DOI: 10.7554/elife.39856
2019
Cited 172 times
Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances
We use a genome-wide association of 1 million parental lifespans of genotyped subjects and data on mortality risk factors to validate previously unreplicated findings near CDKN2B-AS1, ATXN2/BRAP, FURIN/FES, ZW10, PSORS1C3, and 13q21.31, and identify and replicate novel findings near ABO, ZC3HC1, and IGF2R. We also validate previous findings near 5q33.3/EBF1 and FOXO3, whilst finding contradictory evidence at other loci. Gene set and cell-specific analyses show that expression in foetal brain cells and adult dorsolateral prefrontal cortex is enriched for lifespan variation, as are gene pathways involving lipid proteins and homeostasis, vesicle-mediated transport, and synaptic function. Individual genetic variants that increase dementia, cardiovascular disease, and lung cancer - but not other cancers - explain the most variance. Resulting polygenic scores show a mean lifespan difference of around five years of life across the deciles.This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).Ageing happens to us all, and as the cabaret singer Maurice Chevalier pointed out, "old age is not that bad when you consider the alternative". Yet, the growing ageing population of most developed countries presents challenges to healthcare systems and government finances. For many older people, long periods of ill health are part of the end of life, and so a better understanding of ageing could offer the opportunity to prolong healthy living into old age. Ageing is complex and takes a long time to study – a lifetime in fact. This makes it difficult to discern its causes, among the countless possibilities based on an individual’s genes, behaviour or environment. While thousands of regions in an individual’s genetic makeup are known to influence their risk of different diseases, those that affect how long they will live have proved harder to disentangle. Timmers et al. sought to pinpoint such regions, and then use this information to predict, based on their DNA, whether someone had a better or worse chance of living longer than average. The DNA of over 500,000 people was read to reveal the specific ‘genetic fingerprints’ of each participant. Then, after asking each of the participants how long both of their parents had lived, Timmers et al. pinpointed 12 DNA regions that affect lifespan. Five of these regions were new and had not been linked to lifespan before. Across the twelve as a whole several were known to be involved in Alzheimer’s disease, smoking-related cancer or heart disease. Looking at the entire genome, Timmers et al. could then predict a lifespan score for each individual, and when they sorted participants into ten groups based on these scores they found that top group lived five years longer than the bottom, on average. Many factors beside genetics influence how long a person will live and our lifespan cannot be read from our DNA alone. Nevertheless, Timmers et al. had hoped to narrow down their search and discover specific genes that directly influence how quickly people age, beyond diseases. If such genes exist, their effects were too small to be detected in this study. The next step will be to expand the study to include more participants, which will hopefully pinpoint further genomic regions and help disentangle the biology of ageing and disease.
DOI: 10.1073/pnas.1617042114
2017
Cited 168 times
Genetic signatures of high-altitude adaptation in Tibetans
Indigenous Tibetan people have lived on the Tibetan Plateau for millennia. There is a long-standing question about the genetic basis of high-altitude adaptation in Tibetans. We conduct a genome-wide study of 7.3 million genotyped and imputed SNPs of 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian ancestry. Using this large dataset, we detect signals of high-altitude adaptation at nine genomic loci, of which seven are unique. The alleles under natural selection at two of these loci [methylenetetrahydrofolate reductase (MTHFR) and EPAS1] are strongly associated with blood-related phenotypes, such as hemoglobin, homocysteine, and folate in Tibetans. The folate-increasing allele of rs1801133 at the MTHFR locus has an increased frequency in Tibetans more than expected under a drift model, which is probably a consequence of adaptation to high UV radiation. These findings provide important insights into understanding the genomic consequences of high-altitude adaptation in Tibetans.
DOI: 10.1038/s41598-018-35871-w
2018
Cited 166 times
Identification of 55,000 Replicated DNA Methylation QTL
Abstract DNA methylation plays an important role in the regulation of transcription. Genetic control of DNA methylation is a potential candidate for explaining the many identified SNP associations with disease that are not found in coding regions. We replicated 52,916 cis and 2,025 trans DNA methylation quantitative trait loci (mQTL) using methylation from whole blood measured on Illumina HumanMethylation450 arrays in the Brisbane Systems Genetics Study (n = 614 from 177 families) and the Lothian Birth Cohorts of 1921 and 1936 (combined n = 1366). The trans mQTL SNPs were found to be over-represented in 1 Mbp subtelomeric regions, and on chromosomes 16 and 19. There was a significant increase in trans mQTL DNA methylation sites in upstream and 5′ UTR regions. The genetic heritability of a number of complex traits and diseases was partitioned into components due to mQTL and the remainder of the genome. Significant enrichment was observed for height (p = 2.1 × 10 −10 ), ulcerative colitis (p = 2 × 10 −5 ), Crohn’s disease (p = 6 × 10 −8 ) and coronary artery disease (p = 5.5 × 10 −6 ) when compared to a random sample of SNPs with matched minor allele frequency, although this enrichment is explained by the genomic location of the mQTL SNPs.
DOI: 10.1186/s13059-018-1514-1
2018
Cited 155 times
Epigenetic prediction of complex traits and death
Genome-wide DNA methylation (DNAm) profiling has allowed for the development of molecular predictors for a multitude of traits and diseases. Such predictors may be more accurate than the self-reported phenotypes and could have clinical applications.Here, penalized regression models are used to develop DNAm predictors for ten modifiable health and lifestyle factors in a cohort of 5087 individuals. Using an independent test cohort comprising 895 individuals, the proportion of phenotypic variance explained in each trait is examined for DNAm-based and genetic predictors. Receiver operator characteristic curves are generated to investigate the predictive performance of DNAm-based predictors, using dichotomized phenotypes. The relationship between DNAm scores and all-cause mortality (n = 212 events) is assessed via Cox proportional hazards models. DNAm predictors for smoking, alcohol, education, and waist-to-hip ratio are shown to predict mortality in multivariate models. The predictors show moderate discrimination of obesity, alcohol consumption, and HDL cholesterol. There is excellent discrimination of current smoking status, poorer discrimination of college-educated individuals and those with high total cholesterol, LDL with remnant cholesterol, and total:HDL cholesterol ratios.DNAm predictors correlate with lifestyle factors that are associated with health and mortality. They may supplement DNAm-based predictors of age to identify the lifestyle profiles of individuals and predict disease risk.
DOI: 10.1038/s41467-019-09572-5
2019
Cited 151 times
Genome-wide association study of medication-use and associated disease in the UK Biobank
Abstract Genome-wide association studies (GWASs) of medication use may contribute to understanding of disease etiology, could generate new leads relevant for drug discovery and can be used to quantify future risk of medication taking. Here, we conduct GWASs of self-reported medication use from 23 medication categories in approximately 320,000 individuals from the UK Biobank. A total of 505 independent genetic loci that meet stringent criteria ( P &lt; 10 −8 /23) for statistical significance are identified. We investigate the implications of these GWAS findings in relation to biological mechanism, potential drug target identification and genetic risk stratification of disease. Amongst the medication-associated genes are 16 known therapeutic-effect target genes for medications from 9 categories. Two of the medication classes studied are for disorders that have not previously been subject to large GWAS (hypothyroidism and gastro-oesophageal reflux disease).
DOI: 10.1038/s41467-017-02697-5
2018
Cited 149 times
GWAS of epigenetic aging rates in blood reveals a critical role for TERT
DNA methylation age is an accurate biomarker of chronological age and predicts lifespan, but its underlying molecular mechanisms are unknown. In this genome-wide association study of 9907 individuals, we find gene variants mapping to five loci associated with intrinsic epigenetic age acceleration (IEAA) and gene variants in three loci associated with extrinsic epigenetic age acceleration (EEAA). Mendelian randomization analysis suggests causal influences of menarche and menopause on IEAA and lipoproteins on IEAA and EEAA. Variants associated with longer leukocyte telomere length (LTL) in the telomerase reverse transcriptase gene (TERT) paradoxically confer higher IEAA (P < 2.7 × 10-11). Causal modeling indicates TERT-specific and independent effects on LTL and IEAA. Experimental hTERT-expression in primary human fibroblasts engenders a linear increase in DNA methylation age with cell population doubling number. Together, these findings indicate a critical role for hTERT in regulating the epigenetic clock, in addition to its established role of compensating for cell replication-dependent telomere shortening.
DOI: 10.1038/s41467-020-17719-y
2020
Cited 148 times
Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations
Abstract Polygenic scores (PGS) have been widely used to predict disease risk using variants identified from genome-wide association studies (GWAS). To date, most GWAS have been conducted in populations of European ancestry, which limits the use of GWAS-derived PGS in non-European ancestry populations. Here, we derive a theoretical model of the relative accuracy (RA) of PGS across ancestries. We show through extensive simulations that the RA of PGS based on genome-wide significant SNPs can be predicted accurately from modelling linkage disequilibrium (LD), minor allele frequencies (MAF), cross-population correlations of causal SNP effects and heritability. We find that LD and MAF differences between ancestries can explain between 70 and 80% of the loss of RA of European-based PGS in African ancestry for traits like body mass index and type 2 diabetes. Our results suggest that causal variants underlying common genetic variation identified in European ancestry GWAS are mostly shared across continents.
DOI: 10.1038/s41467-017-02769-6
2018
Cited 142 times
Improving genetic prediction by leveraging genetic correlations among human diseases and traits
Abstract Genomic prediction has the potential to contribute to precision medicine. However, to date, the utility of such predictors is limited due to low accuracy for most traits. Here theory and simulation study are used to demonstrate that widespread pleiotropy among phenotypes can be utilised to improve genomic risk prediction. We show how a genetic predictor can be created as a weighted index that combines published genome-wide association study (GWAS) summary statistics across many different traits. We apply this framework to predict risk of schizophrenia and bipolar disorder in the Psychiatric Genomics consortium data, finding substantial heterogeneity in prediction accuracy increases across cohorts. For six additional phenotypes in the UK Biobank data, we find increases in prediction accuracy ranging from 0.7% for height to 47% for type 2 diabetes, when using a multi-trait predictor that combines published summary statistics from multiple traits, as compared to a predictor based only on one trait.
DOI: 10.1186/s13073-016-0332-x
2016
Cited 140 times
Genetic pleiotropy in complex traits and diseases: implications for genomic medicine
Editorial summary Several recent papers have used summary results from genome-wide association studies to characterize genetic overlap between human complex traits and common diseases. The emerging evidence is that individual DNA variants frequently influence multiple phenotypes, often in unexpected ways. This has important implications for genomic medicine and for the application of genome editing.
DOI: 10.1126/sciadv.aaw3538
2019
Cited 129 times
Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank
Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex trait variation. However, it remains challenging to identify genetic variants with GEI effects in humans largely because of the small effect sizes and the difficulty of monitoring environmental fluctuations. Here, we demonstrate that GEI can be inferred from genetic variants associated with phenotypic variability in a large sample without the need of measuring environmental factors. We performed a genome-wide variance quantitative trait locus (vQTL) analysis of ~5.6 million variants on 348,501 unrelated individuals of European ancestry for 13 quantitative traits in the UK Biobank and identified 75 significant vQTLs with P < 2.0 × 10-9 for 9 traits, especially for those related to obesity. Direct GEI analysis with five environmental factors showed that the vQTLs were strongly enriched with GEI effects. Our results indicate pervasive GEI effects for obesity-related traits and demonstrate the detection of GEI without environmental data.
DOI: 10.1038/s41467-020-18534-1
2020
Cited 114 times
Risk prediction of late-onset Alzheimer’s disease implies an oligogenic architecture
Abstract Genetic association studies have identified 44 common genome-wide significant risk loci for late-onset Alzheimer’s disease (LOAD). However, LOAD genetic architecture and prediction are unclear. Here we estimate the optimal P -threshold ( P optimal ) of a genetic risk score (GRS) for prediction of LOAD in three independent datasets comprising 676 cases and 35,675 family history proxy cases. We show that the discriminative ability of GRS in LOAD prediction is maximised when selecting a small number of SNPs. Both simulation results and direct estimation indicate that the number of causal common SNPs for LOAD may be less than 100, suggesting LOAD is more oligogenic than polygenic. The best GRS explains approximately 75% of SNP-heritability, and individuals in the top decile of GRS have ten-fold increased odds when compared to those in the bottom decile. In addition, 14 variants are identified that contribute to both LOAD risk and age at onset of LOAD.
DOI: 10.1016/j.biopsych.2021.04.018
2021
Cited 111 times
A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts
Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies. PGS methods differ in which DNA variants are included and the weights assigned to them; some require an independent tuning sample to help inform these choices. PGSs are evaluated in independent target cohorts with known disease status. Variability between target cohorts is observed in applications to real data sets, which could reflect a number of factors, e.g., phenotype definition or technical factors.The Psychiatric Genomics Consortium Working Groups for schizophrenia and major depressive disorder bring together many independently collected case-control cohorts. We used these resources (31,328 schizophrenia cases, 41,191 controls; 248,750 major depressive disorder cases, 563,184 controls) in repeated application of leave-one-cohort-out meta-analyses, each used to calculate and evaluate PGS in the left-out (target) cohort. Ten PGS methods (the baseline PC+T method and 9 methods that model genetic architecture more formally: SBLUP, LDpred2-Inf, LDpred-funct, LDpred2, Lassosum, PRS-CS, PRS-CS-auto, SBayesR, MegaPRS) were compared.Compared with PC+T, the other 9 methods gave higher prediction statistics, MegaPRS, LDPred2, and SBayesR significantly so, explaining up to 9.2% variance in liability for schizophrenia across 30 target cohorts, an increase of 44%. For major depressive disorder across 26 target cohorts, these statistics were 3.5% and 59%, respectively.Although the methods that more formally model genetic architecture have similar performance, MegaPRS, LDpred2, and SBayesR rank highest in most comparisons and are recommended in applications to psychiatric disorders.
DOI: 10.1056/nejmsr2105065
2021
Cited 106 times
Problems with Using Polygenic Scores to Select Embryos
Companies have recently begun to sell a new service to patients considering in vitro fertilization: embryo selection based on polygenic scores (ESPS). These scores represent individualized predictions of health and other outcomes derived from genomewide association studies in adults to partially predict these outcomes. This article includes a discussion of many factors that lower the predictive power of polygenic scores in the context of embryo selection and quantifies these effects for a variety of clinical and nonclinical traits. Also discussed are potential unintended consequences of ESPS (including selecting for adverse traits, altering population demographics, exacerbating inequalities in society, and devaluing certain traits). Recommendations for the responsible communication about ESPS by practitioners are provided, and a call for a society-wide conversation about this technology is made. (Funded by the National Institute on Aging and others.).