ϟ

Jacqueline I. Goldstein

Here are all the papers by Jacqueline I. Goldstein that you can download and read on OA.mg.
Jacqueline I. Goldstein’s last known institution is . Download Jacqueline I. Goldstein PDFs here.

Claim this Profile →
DOI: 10.1038/s41588-018-0269-7
2018
Cited 1,692 times
Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder
Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits.
DOI: 10.1038/s41588-019-0344-8
2019
Cited 1,616 times
Identification of common genetic risk variants for autism spectrum disorder
Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample-size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 individuals with ASD and 27,969 controls that identified five genome-wide-significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), we identified seven additional loci shared with other traits at equally strict significance levels. Dissecting the polygenic architecture, we found both quantitative and qualitative polygenic heterogeneity across ASD subtypes. These results highlight biological insights, particularly relating to neuronal function and corticogenesis, and establish that GWAS performed at scale will be much more productive in the near term in ASD.
DOI: 10.1126/science.aad6469
2018
Cited 844 times
Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap
The predisposition to neuropsychiatric disease involves a complex, polygenic, and pleiotropic genetic architecture. However, little is known about how genetic variants impart brain dysfunction or pathology. We used transcriptomic profiling as a quantitative readout of molecular brain-based phenotypes across five major psychiatric disorders-autism, schizophrenia, bipolar disorder, depression, and alcoholism-compared with matched controls. We identified patterns of shared and distinct gene-expression perturbations across these conditions. The degree of sharing of transcriptional dysregulation is related to polygenic (single-nucleotide polymorphism-based) overlap across disorders, suggesting a substantial causal genetic component. This comprehensive systems-level view of the neurobiological architecture of major neuropsychiatric illness demonstrates pathways of molecular convergence and specificity.
DOI: 10.1038/ng.3725
2016
Cited 843 times
Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects
Copy number variants (CNVs) have been strongly implicated in the genetic etiology of schizophrenia (SCZ). However, genome-wide investigation of the contribution of CNV to risk has been hampered by limited sample sizes. We sought to address this obstacle by applying a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. A global enrichment of CNV burden was observed in cases (odds ratio (OR) = 1.11, P = 5.7 × 10-15), which persisted after excluding loci implicated in previous studies (OR = 1.07, P = 1.7 × 10-6). CNV burden was enriched for genes associated with synaptic function (OR = 1.68, P = 2.8 × 10-11) and neurobehavioral phenotypes in mouse (OR = 1.18, P = 7.3 × 10-5). Genome-wide significant evidence was obtained for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. Suggestive support was found for eight additional candidate susceptibility and protective loci, which consisted predominantly of CNVs mediated by nonallelic homologous recombination.
DOI: 10.1038/ng.3863
2017
Cited 408 times
Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders
Autism spectrum disorder (ASD) risk is influenced by common polygenic and de novo variation. We aimed to clarify the influence of polygenic risk for ASD and to identify subgroups of ASD cases, including those with strongly acting de novo variants, in which polygenic risk is relevant. Using a novel approach called the polygenic transmission disequilibrium test and data from 6,454 families with a child with ASD, we show that polygenic risk for ASD, schizophrenia, and greater educational attainment is over-transmitted to children with ASD. These findings hold independent of proband IQ. We find that polygenic variation contributes additively to risk in ASD cases who carry a strongly acting de novo variant. Lastly, we show that elements of polygenic risk are independent and differ in their relationship with phenotype. These results confirm that the genetic influences on ASD are additive and suggest that they create risk through at least partially distinct etiologic pathways.
DOI: 10.1038/ng.2741
2013
Cited 317 times
Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration
To define the role of rare variants in advanced age-related macular degeneration (AMD) risk, we sequenced the exons of 681 genes within all reported AMD loci and related pathways in 2,493 cases and controls. We first tested each gene for increased or decreased burden of rare variants in cases compared to controls. We found that 7.8% of AMD cases compared to 2.3% of controls are carriers of rare missense CFI variants (odds ratio (OR) = 3.6; P = 2 × 10(-8)). There was a predominance of dysfunctional variants in cases compared to controls. We then tested individual variants for association with disease. We observed significant association with rare missense alleles in genes other than CFI. Genotyping in 5,115 independent samples confirmed associations with AMD of an allele in C3 encoding p.Lys155Gln (replication P = 3.5 × 10(-5), OR = 2.8; joint P = 5.2 × 10(-9), OR = 3.8) and an allele in C9 encoding p.Pro167Ser (replication P = 2.4 × 10(-5), OR = 2.2; joint P = 6.5 × 10(-7), OR = 2.2). Finally, we show that the allele of C3 encoding Gln155 results in resistance to proteolytic inactivation by CFH and CFI. These results implicate loss of C3 protein regulation and excessive alternative complement activation in AMD pathogenesis, thus informing both the direction of effect and mechanistic underpinnings of this disorder.
DOI: 10.1093/bioinformatics/bts479
2012
Cited 190 times
zCall: a rare variant caller for array-based genotyping
zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available.http://atguweb.mgh.harvard.edu/apps/zcall.bneale@broadinstitute.orgSupplementary data are available at Bioinformatics online.
DOI: 10.1016/j.biopsych.2021.04.018
2021
Cited 111 times
A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts
Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies. PGS methods differ in which DNA variants are included and the weights assigned to them; some require an independent tuning sample to help inform these choices. PGSs are evaluated in independent target cohorts with known disease status. Variability between target cohorts is observed in applications to real data sets, which could reflect a number of factors, e.g., phenotype definition or technical factors.The Psychiatric Genomics Consortium Working Groups for schizophrenia and major depressive disorder bring together many independently collected case-control cohorts. We used these resources (31,328 schizophrenia cases, 41,191 controls; 248,750 major depressive disorder cases, 563,184 controls) in repeated application of leave-one-cohort-out meta-analyses, each used to calculate and evaluate PGS in the left-out (target) cohort. Ten PGS methods (the baseline PC+T method and 9 methods that model genetic architecture more formally: SBLUP, LDpred2-Inf, LDpred-funct, LDpred2, Lassosum, PRS-CS, PRS-CS-auto, SBayesR, MegaPRS) were compared.Compared with PC+T, the other 9 methods gave higher prediction statistics, MegaPRS, LDPred2, and SBayesR significantly so, explaining up to 9.2% variance in liability for schizophrenia across 30 target cohorts, an increase of 44%. For major depressive disorder across 26 target cohorts, these statistics were 3.5% and 59%, respectively.Although the methods that more formally model genetic architecture have similar performance, MegaPRS, LDpred2, and SBayesR rank highest in most comparisons and are recommended in applications to psychiatric disorders.
DOI: 10.1016/j.xgen.2022.100168
2022
Cited 100 times
Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes
Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.
DOI: 10.1038/ncomms5757
2014
Cited 150 times
Clozapine-induced agranulocytosis is associated with rare HLA-DQB1 and HLA-B alleles
Clozapine is a particularly effective antipsychotic medication but its use is curtailed by the risk of clozapine-induced agranulocytosis/granulocytopenia (CIAG), a severe adverse drug reaction occurring in up to 1% of treated individuals. Identifying genetic risk factors for CIAG could enable safer and more widespread use of clozapine. Here we perform the largest and most comprehensive genetic study of CIAG to date by interrogating 163 cases using genome-wide genotyping and whole-exome sequencing. We find that two loci in the major histocompatibility complex are independently associated with CIAG: a single amino acid in HLA-DQB1 (126Q) (P=4.7 × 10−14, odds ratio (OR)=0.19, 95% confidence interval (CI)=0.12–0.29) and an amino acid change in the extracellular binding pocket of HLA-B (158T) (P=6.4 × 10−10, OR=3.3, 95% CI=2.3–4.9). These associations dovetail with the roles of these genes in immunogenetic phenotypes and adverse drug responses for other medications, and provide insight into the pathophysiology of CIAG. Clozapine-induced agranulocytosis/granulocytopenia, or CIAG, is characterised by a rare and potentially fatal reaction to antipsychotic drugs. Here, the authors identify genetic variants in two immune-related genes that may contribute to the pathophysiology of CIAG.
DOI: 10.1101/145581
2017
Cited 130 times
Discovery of the first genome-wide significant risk loci for ADHD
Abstract Attention-Deficit/Hyperactivity Disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of school-age children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no individual variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 ADHD cases and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, revealing new and important information on the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes, as well as around brain-expressed regulatory marks. These findings, based on clinical interviews and/or medical records are supported by additional analyses of a self-reported ADHD sample and a study of quantitative measures of ADHD symptoms in the population. Meta-analyzing these data with our primary scan yielded a total of 16 genome-wide significant loci. The results support the hypothesis that clinical diagnosis of ADHD is an extreme expression of one or more continuous heritable traits.
DOI: 10.1016/j.ajhg.2018.03.021
2018
Cited 125 times
Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood
Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e., linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on ∼150,000 individuals give a higher accuracy than LDSC estimates based on ∼400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-datasets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser. Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e., linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on ∼150,000 individuals give a higher accuracy than LDSC estimates based on ∼400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-datasets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser. Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases.1Mehta D. Tropf F.C. Gratten J. Bakshi A. Zhu Z. Bacanu S.-A. Hemani G. Magnusson P.K.E. Barban N. Esko T. et al.Schizophrenia Working Group of the Psychiatric Genomics Consortium, LifeLines Cohort Study, and TwinsUKEvidence for genetic overlap between schizophrenia and age at first birth in women.JAMA Psychiatry. 2016; 73: 497-505Crossref PubMed Scopus (30) Google Scholar, 2Lee S.H. Byrne E.M. Hultman C.M. Kähler A. Vinkhuyzen A.A.E. Ripke S. Andreassen O.A. Frisell T. Gusev A. Hu X. et al.Schizophrenia Working Group of the Psychiatric Genomics Consortium and Rheumatoid Arthritis Consortium InternationalSchizophrenia Working Group of the Psychiatric Genomics Consortium AuthorsSchizophrenia Working Group of the Psychiatric Genomics Consortium CollaboratorsRheumatoid Arthritis Consortium International AuthorsRheumatoid Arthritis Consortium International CollaboratorsNew data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis.Int. J. Epidemiol. 2015; 44: 1706-1721Crossref PubMed Scopus (49) Google Scholar, 3Lee S.H. DeCandia T.R. Ripke S. Yang J. Sullivan P.F. Goddard M.E. Keller M.C. Visscher P.M. Wray N.R. Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC-SCZ)International Schizophrenia Consortium (ISC)Molecular Genetics of Schizophrenia Collaboration (MGS)Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs.Nat. Genet. 2012; 44: 247-250Crossref PubMed Scopus (439) Google Scholar The genetic correlation is the additive genetic covariance between two traits scaled by the square root of the product of the genetic variance for each trait (i.e., the geometric mean of the trait variances). The sign of the correlation shows the direction of sharing, and the parameter definition is based on genetic variants across the allelic spectrum. Methods to estimate genetic correlation based on genetic covariance structure are well established for both quantitative and disease traits, e.g., (restricted) maximum likelihood for linear mixed models (LMM).4Lee S.H. Yang J. Goddard M.E. Visscher P.M. Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood.Bioinformatics. 2012; 28: 2540-2542Crossref PubMed Scopus (379) Google Scholar, 5Harville D.A. Maximum likelihood approaches to variance component estimation and to related problems.J. Am. Stat. Assoc. 1977; 72: 320-338Crossref Scopus (1543) Google Scholar, 6Patterson H.D. Thompson R. Recovery of inter-block information when block sizes are unequal.Biometrika. 1971; 58: 545-554Crossref Scopus (2718) Google Scholar Genetic covariance structure can be derived from phenotypic records using pedigree information in twin or family-based designs.7Neale M. Cardon L. Methodology for Genetic Studies of Twins and Families. Springer Science & Business Media, 2013Google Scholar Recently, genome-wide single-nucleotide polymorphism (SNP) data have been used to construct a genomic relationship matrix for the genetic covariance structure in LMM that captures the contribution of causal variants that are in linkage disequilibrium (LD) with the genotyped SNPs.4Lee S.H. Yang J. Goddard M.E. Visscher P.M. Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood.Bioinformatics. 2012; 28: 2540-2542Crossref PubMed Scopus (379) Google Scholar, 8VanRaden P.M. Efficient methods to compute genomic predictions.J. Dairy Sci. 2008; 91: 4414-4423Abstract Full Text Full Text PDF PubMed Scopus (3096) Google Scholar, 9Yang J. Lee S.H. Goddard M.E. Visscher P.M. GCTA: a tool for genome-wide complex trait analysis.Am. J. Hum. Genet. 2011; 88: 76-82Abstract Full Text Full Text PDF PubMed Scopus (3814) Google Scholar Such estimates assume that the genetic correlation estimated from common SNPs is representative of the parameter that depends on all genetic variants; this seems like a reasonable assumption. In contrast to the genomic restricted maximum likelihood (GREML) approach, a linkage disequilibrium score regression (LDSC)10Bulik-Sullivan B.K. Loh P.R. Finucane H.K. Ripke S. Yang J. Patterson N. Daly M.J. Price A.L. Neale B.M. Schizophrenia Working Group of the Psychiatric Genomics ConsortiumLD Score regression distinguishes confounding from polygenicity in genome-wide association studies.Nat. Genet. 2015; 47: 291-295Crossref PubMed Scopus (1923) Google Scholar, 11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar method does not require individual-level genotype data but instead uses GWAS summary statistics, regressing association test statistics of SNPs on their LD scores. The LD score of a SNP is the sum of LD r2 measured with all other SNPs and can be calculated in a reference sample of the same ethnicity when individual genotype data are not available for the GWAS sample, under the assumption that the GWAS sample has been drawn from the same ethnic population as the reference sample used to calculate the LD scores. The method exploits the relationship between association test statistic and LD score expected under polygenicity. Because of this simplicity, and the massively reduced computing burden in terms of memory and time, it is feasible for LDSC to be applied to a large number of multiple traits, e.g., Bulik-Sullivan et al.,11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar Zheng et al.,12Zheng J. Erzurumluoglu A.M. Elsworth B.L. Kemp J.P. Howe L. Haycock P.C. Hemani G. Tansey K. Laurin C. Pourcain B.S. et al.Early Genetics and Lifecourse Epidemiology (EAGLE) Eczema ConsortiumLD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis.Bioinformatics. 2017; 33: 272-279Crossref PubMed Scopus (484) Google Scholar Finucane et al.13Finucane H.K. Bulik-Sullivan B. Gusev A. Trynka G. Reshef Y. Loh P.R. Anttila V. Xu H. Zang C. Farh K. et al.ReproGen ConsortiumSchizophrenia Working Group of the Psychiatric Genomics ConsortiumRACI ConsortiumPartitioning heritability by functional annotation using genome-wide association summary statistics.Nat. Genet. 2015; 47: 1228-1235Crossref PubMed Scopus (980) Google Scholar Given the attractiveness of LDSC for a massive analysis of many sets of GWAS summary statistics, it has been widely used in the community. However, genetic correlations estimated by LDSC are often reported without caution although the approach is known to be less accurate, compared to GREML.11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar In fact, the accuracies of LDSC estimates have not been thoroughly studied. In this report, we compare both the bias (difference between the simulated true value and estimated value) and accuracy (magnitude of the standard error of an estimate [SE]) between GREML and LDSC for estimation of genetic correlation. We find that both methods show little evidence of bias. However, LDSC is less accurate as reported in Bulik Sullivan et al.,11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar with SE at least more than 1.5-fold higher than that of GREML regardless of the number of samples in data used to estimate the genetic correlation. When decreasing the number of SNPs, the accuracy of LDSC decreases further. When increasing the degree of genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the SE of LDSC estimates are up to 3-fold larger than those of the GREML estimates. We also show that GREML is more accurate in genomic partitioning analyses over LDSC or stratified LDSC (sLDSC). In genomic partitioning analyses, the genetic parameters are estimated for genomic subsets defined by user-specified annotations. In analyses of real data, we show that GREML is more accurate and powerful, e.g., GREML estimates based on ∼150,000 individuals give a higher accuracy than LDSC estimates based on 400,000 individuals in estimating genetic correlation between schizophrenia (SCZ) and body mass index (BMI) (−0.136 [SE = 0.017] and p value = 4.54E−15 for GREML versus −0.087 [SE = 0.019] and p value = 4.91E−06 for LDSC). In these analyses, the GREML estimate is based on UK sample only whereas the LDSC estimate is based on combined meta-datasets among which there is uncertainty about homogeneity. Furthermore, a GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which is less obvious by LDSC when using both whole-genome and partitioned estimates of genetic correlation. In the main methods, we used GREML14Lee S.H. van der Werf J.H. MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information.Bioinformatics. 2016; 32: 1420-1422Crossref PubMed Scopus (90) Google Scholar, 15Maier R. Moser G. Chen G.-B. Ripke S. Coryell W. Potash J.B. Scheftner W.A. Shi J. Weissman M.M. Hultman C.M. et al.Cross-Disorder Working Group of the Psychiatric Genomics ConsortiumJoint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder.Am. J. Hum. Genet. 2015; 96: 283-294Abstract Full Text Full Text PDF PubMed Scopus (162) Google Scholar and LDSC10Bulik-Sullivan B.K. Loh P.R. Finucane H.K. Ripke S. Yang J. Patterson N. Daly M.J. Price A.L. Neale B.M. Schizophrenia Working Group of the Psychiatric Genomics ConsortiumLD Score regression distinguishes confounding from polygenicity in genome-wide association studies.Nat. Genet. 2015; 47: 291-295Crossref PubMed Scopus (1923) Google Scholar, 11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar to compare their estimates of genetic correlation using simulated as well as real data. Simulations were based on UK Biobank imputed genotype data (UKBB16Collins R. What makes UK Biobank special?.Lancet. 2012; 379: 1173-1174Abstract Full Text Full Text PDF PubMed Scopus (449) Google Scholar) after stringent quality control (QC) (see Supplemental Methods). We calculated a ratio of empirical SE and its 95% confidence interval (CI) to assess the accuracy of the methods for each set of simulated data. The 95% CIs of SE were estimated based on the delta method.17Lynch M. Walsh B. Genetics and Analysis of Quantitative Traits. Sinauer Sunderland, MA1998Google Scholar When estimating genetic correlation using simulated phenotypes based on UKBB genotype data, we found that the estimates were unbiased for both GREML and LDSC (Figure S1), but the SE of GREML was at least 1.5 times smaller than that of LDSC (Figure 1). The ratio of the empirical SE from LDSC to GREML was increased up to 3.5-fold when using a smaller number of SNPs (Figure 1). All values of the ratio were significantly different from 1. It is notable that the SE of GREML estimates showed almost no difference across different numbers of SNPs whereas that of LDSC estimates gradually increased with a smaller number of SNPs (Figure S2). The ratio was invariant to sample size (Figure S3). As expected, when using the intercept constrained to zero, LDSC estimates were substantially biased when there were overlapping samples (Figure S4). We also explored alternative genetic architectures (Figure S5), which consistently showed that GREML gives a smaller SE than LDSC in any scenario. To explore the stability of the accuracy for both methods, we used two additional genotype datasets without imputation, Wellcome Trust Case Control Consortium 2 (WTCCC218Sawcer S. Hellenthal G. Pirinen M. Spencer C.C. Patsopoulos N.A. Moutsianas L. Dilthey A. Su Z. Freeman C. Hunt S.E. et al.International Multiple Sclerosis Genetics ConsortiumWellcome Trust Case Control Consortium 2Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis.Nature. 2011; 476: 214-219Crossref PubMed Scopus (2024) Google Scholar, 19Mells G.F. Floyd J.A.B. Morley K.I. Cordell H.J. Franklin C.S. Shin S.-Y. Heneghan M.A. Neuberger J.M. Donaldson P.T. Day D.B. et al.UK PBC ConsortiumWellcome Trust Case Control Consortium 3Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis.Nat. Genet. 2011; 43: 329-332Crossref PubMed Scopus (379) Google Scholar, 20Bellenguez C. Bevan S. Gschwendtner A. Spencer C.C. Burgess A.I. Pirinen M. Jackson C.A. Traylor M. Strange A. Su Z. et al.International Stroke Genetics Consortium (ISGC)Wellcome Trust Case Control Consortium 2 (WTCCC2)Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke.Nat. Genet. 2012; 44: 328-333Crossref PubMed Scopus (332) Google Scholar, 21Tsoi L.C. Spain S.L. Knight J. Ellinghaus E. Stuart P.E. Capon F. Ding J. Li Y. Tejasvi T. Gudjonsson J.E. et al.Collaborative Association Study of Psoriasis (CASP)Genetic Analysis of Psoriasis ConsortiumPsoriasis Association Genetics ExtensionWellcome Trust Case Control Consortium 2Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity.Nat. Genet. 2012; 44: 1341-1348Crossref PubMed Scopus (702) Google Scholar) and genetic epidemiology research on adult health and aging cohort (GERA22Banda Y. Kvale M.N. Hoffmann T.J. Hesselson S.E. Ranatunga D. Tang H. Sabatti C. Croen L.A. Dispensa B.P. Henderson M. et al.Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort.Genetics. 2015; 200: 1285-1295Crossref PubMed Scopus (191) Google Scholar, 23Lee S.H. Weerasinghe W.M.S.P. Wray N.R. Goddard M.E. van der Werf J.H.J. Using information of relatives in genomic prediction to apply effective stratified medicine.Sci. Rep. 2017; 7: 42091Crossref PubMed Scopus (29) Google Scholar), which are publicly available (see Supplemental Methods for detailed data descriptions). We also used UKBB raw (non-imputed) genotype data (UKBBr). We calculated the correlation between the LD scores for the HapMap3 SNPs estimated based on the 1KG CEU reference sample (see Web Resources) and those based on in-sample genotype data, i.e., UKBB, WTCCC2, GERA, and UKBBr dataset (Table 1). We found that the WTCCC2, GERA, and UKBBr (raw) genotypes were less similar to the 1KG reference genotypes, compared to the UKBB (imputed) genotypes (noting that UKBB samples had been imputed to the combined data of 1KG reference and UK10K data). Table 2 shows that the SE ratio of LDSC estimate to GREML estimate was higher for WTCCC2, GERA, and UKBBr than that for UKBB. Figure 2 shows that the accuracy of GREML was consistent across different datasets, whereas that of LDSC was decreased for WTCCC2, GERA, or UKBBr, compared to UKBB dataset. This was probably due to higher (or lower) correlation between LD scores based on the 1KG reference and the in-sample genotype datasets (Table 1) which might positively or (negatively) affect the accuracy of LDSC estimates. For WTCCC2, GERA, and UKBBr data, the SE ratio of LDSC to GREML based on different number of individuals is shown in Figures S6–S8.Table 1Correlation between LD Scores Estimated Based on the HapMap3 SNPs using the 1KG CEU Reference Sample and that from Different Target PopulationsCorrelationNr.SNPsUKBBaUKBB was imputed to the combined data of the 1KG reference and UK10K data.0.946858,991UKBBrbUKBBr was based on the raw genotype data of UK Biobank data.0.720123,615cThe number of SNPs reduced further from the set of the QCed SNPs because of using only SNPs matched with the HapMap3 SNPs used in calculating LD scores.WTCCC20.899421,035cThe number of SNPs reduced further from the set of the QCed SNPs because of using only SNPs matched with the HapMap3 SNPs used in calculating LD scores.GERA0.661238,089cThe number of SNPs reduced further from the set of the QCed SNPs because of using only SNPs matched with the HapMap3 SNPs used in calculating LD scores.a UKBB was imputed to the combined data of the 1KG reference and UK10K data.b UKBBr was based on the raw genotype data of UK Biobank data.c The number of SNPs reduced further from the set of the QCed SNPs because of using only SNPs matched with the HapMap3 SNPs used in calculating LD scores. Open table in a new tab Table 2The Ratio of SE of LDSC Estimate to That of GREML Estimate using Simulated Phenotypes Based on UKBB, WTCCC2, GERA, and UKBBr Genotypes in the Scenarios without Overlapping Individuals800k400k200k100kUKBB1.60 (0.15)1.70 (0.18)1.85 (0.25)2.04 (0.33)WTCCC2NA2.15 (0.31)2.35 (0.43)2.68 (0.61)GERANANA2.87 (0.56)3.31 (1.17)UKBBrNANANA3.74 (0.79) Open table in a new tab Genome partitioning analyses are an emerging tool to estimate the genetic variance and covariance explained by functional categories (e.g., DNase I hypersensitive sites [DHS] and non-DHS24Gusev A. Lee S.H. Trynka G. Finucane H. Vilhjálmsson B.J. Xu H. Zang C. Ripke S. Bulik-Sullivan B. Stahl E. et al.Schizophrenia Working Group of the Psychiatric Genomics ConsortiumSWE-SCZ ConsortiumSchizophrenia Working Group of the Psychiatric Genomics ConsortiumSWE-SCZ ConsortiumPartitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.Am. J. Hum. Genet. 2014; 95: 535-552Abstract Full Text Full Text PDF PubMed Scopus (365) Google Scholar). Currently, genomic partitioning analyses focus on SNP-heritability enrichment analyses, formally testing for enrichment of signal compared to the expectation that the estimates are proportional to the number of SNPs allocated to each annotation. Considering genomic partitioning in cross-disorder analyses is a natural extension to identify regions where genetic correlations between disorders are highest and lowest. Here, we assessed the performance of the methods in the context of genome partitioning analyses using simulated phenotypes based on UKBB genotype data. A better LDSC approach to estimate genetic correlation for each category might be sLDSC, stratifying by genomic annotation; however, this method is currently under development (i.e., there is software [see Web Resources], but there is no published document or paper verifying the method). Nonetheless, since the sLDSC is available to the research community, we applied both LDSC and sLDSC to estimate partitioned genetic correlations for the simulated data (Supplemental Methods). For genome partitioning analyses, we showed that LDSC estimates of genetic correlation were biased whether using LD scores estimated from the 1KG reference or in-sample data (UKBB) while GREML estimates gave unbiased estimates for each functional category (Figure 3). sLDSC estimates were unbiased only when using LD scores from the in-sample data, and their SEs are relatively larger than those of GREML or LDSC (Figure 3). This was probably due to the fact that the different distribution of causal variants and their effects between DHS and non-DHS regions were better captured by an explicit covariance structure fitted in GREML. We also applied the methods to a range of simulation scenarios and found similar results in that GREML performed better than LDSC or sLDSC (Figure S9 and Table S1), which was consistent with the previous results (Figures 1 and 2). It is notable that in a deliberately severe scenario (e.g., causal variants are simulated only within few kb of a boundary), GREML could give biased estimation of genetic correlation.13Finucane H.K. Bulik-Sullivan B. Gusev A. Trynka G. Reshef Y. Loh P.R. Anttila V. Xu H. Zang C. Farh K. et al.ReproGen ConsortiumSchizophrenia Working Group of the Psychiatric Genomics ConsortiumRACI ConsortiumPartitioning heritability by functional annotation using genome-wide association summary statistics.Nat. Genet. 2015; 47: 1228-1235Crossref PubMed Scopus (980) Google Scholar, 24Gusev A. Lee S.H. Trynka G. Finucane H. Vilhjálmsson B.J. Xu H. Zang C. Ripke S. Bulik-Sullivan B. Stahl E. et al.Schizophrenia Working Group of the Psychiatric Genomics ConsortiumSWE-SCZ ConsortiumSchizophrenia Working Group of the Psychiatric Genomics ConsortiumSWE-SCZ ConsortiumPartitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.Am. J. Hum. Genet. 2014; 95: 535-552Abstract Full Text Full Text PDF PubMed Scopus (365) Google Scholar While focusing on the accuracy of genetic correlation estimates, there is an important implication for the bias in SNP-heritability estimates for both GREML and LDSC (Figure S10). When using the WTCCC2, GERA, and UKBBr data, which were less similar to the 1KG reference genotypes, compared to the UKBB data, LDSC estimates were substantially biased whereas GREML estimates were close to the true value in estimation of SNP heritability (Figure S10). However, this result is well known and LDSC was not recommended for SNP heritability by the original authors,10Bulik-Sullivan B.K. Loh P.R. Finucane H.K. Ripke S. Yang J. Patterson N. Daly M.J. Price A.L. Neale B.M. Schizophrenia Working Group of the Psychiatric Genomics ConsortiumLD Score regression distinguishes confounding from polygenicity in genome-wide association studies.Nat. Genet. 2015; 47: 291-295Crossref PubMed Scopus (1923) Google Scholar but rather only for relative enrichment analysis. Despite this, LDSC is widely used for SNP-heritability estimation (because it is quick and simple). Thus, for completeness we include analyses for different scenarios to quantify the properties of the methods. When reducing the number of SNPs, estimated SNP heritabilities from LDSC were consistently unbiased; however, those from GREML were proportionally underestimated (Figure S11). When using non-HapMap3 SNPs, LDSC estimates were consistently biased (Figure S12) and less accurate, compared to GREML estimates (Figures S13 and S14), which probably explains why LDSC is implemented using only HapMap3 SNPs. Although the genetic correlation is robust to such biasedness,4Lee S.H. Yang J. Goddard M.E. Visscher P.M. Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood.Bioinformatics. 2012; 28: 2540-2542Crossref PubMed Scopus (379) Google Scholar, 11Bulik-Sullivan B. Finucane H.K. Anttila V. Gusev A. Day F.R. Loh P.R. Duncan L. Perry J.R. Patterson N. Robinson E.B. et al.ReproGen ConsortiumPsychiatric Genomics ConsortiumGenetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3An atlas of genetic correlations across human diseases and traits.Nat. Genet. 2015; 47: 1236-1241Crossref PubMed Scopus (1656) Google Scholar SNP heritability itself should be carefully interpreted for both GREML and LDSC. We also noted that LDSC and sLDSC estimates for SNP heritability were biased in the genome partitioning analysis (Figure S15) although the estimated enrichment was close to the true value when using sLDSC and in-sample LD scores (Figure S15). We used real phenotype and individual genotype data from the Psychiatric Genomics Consortium (PGC) and UKBB to estimate genetic variance and covariance between SCZ and BMI using LDSC and GREML (Table 3 and Figure S16). We also used publicly available GWAS summary statistics for LDSC to see how much the SE of estimates could be reduced by increasing the number of samples and number of SNPs. For real data analyses, we obtained theoretical SE to assess the accuracy of the methods. GREML and LDSC estimates for the SNP heritability were 0.192 (SE 0.004) and 0.280 (SE 0.016) for SCZ and 0.184 (SE 0.004) and 0.255 (SE 0.014) for BMI. The notable difference between GREML and LDSC was probably because of a relatively small number of SNPs (500K) that might result in underestimated GREML SNP heritability (see Figure S11). This is one of the caveats of using GREML with real data that usually comprise multiple cohorts genotyped on different platforms, such that, even with imputation, the overlapping set of SNPs imputed with high confidence may be limited. The estimated genetic correlation for GREML and LDSC was −0.136 (SE 0.017) and −0.173 (SE 0.031). This indicated that the GREML estimate was 3.5 and 1.8 times more precise than LDSC estimates for the SNP heritability and genetic correlation, respectively. For LDSC, we also considered using additional GWAS summary statistics from publicly available resources.25Schizophrenia Working Group of the Psychiatric Genomics ConsortiumBiological insights from 108 schizophrenia-associated genetic loci.Nature. 2014; 511: 421-427Crossref PubMed Scopus (5068) Google Scholar, 26Locke A.E. Kahali B. Berndt S.I. Justice A.E. Pers T.H. Day F.R. Powell C. Vedantam S. Buchkovich M.L. Yang J. et al.LifeLines Cohort StudyADIPOGen ConsortiumAGEN-BMI Working GroupCARDIOGRAMplusC
DOI: 10.1038/nn.4404
2016
Cited 92 times
Ultra-rare disruptive and damaging mutations influence educational attainment in the general population
Rare genetic mutations that disrupt the functionality of important genes increase the risk of psychiatric and neurodevelopmental disorder. This study found that, in the general population not diagnosed with such disorders, these same mutations affect the average educational level. Carriers of these mutations have on average half a semester less of education than noncarriers. Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.
DOI: 10.1038/s41598-022-26845-0
2023
Cited 10 times
Genome-wide association study of school grades identifies genetic overlap between language ability, psychopathology and creativity
Cognitive functions of individuals with psychiatric disorders differ from that of the general population. Such cognitive differences often manifest early in life as differential school performance and have a strong genetic basis. Here we measured genetic predictors of school performance in 30,982 individuals in English, Danish and mathematics via a genome-wide association study (GWAS) and studied their relationship with risk for six major psychiatric disorders. When decomposing the school performance into math and language-specific performances, we observed phenotypically and genetically a strong negative correlation between math performance and risk for most psychiatric disorders. But language performance correlated positively with risk for certain disorders, especially schizophrenia, which we replicate in an independent sample (n = 4547). We also found that the genetic variants relating to increased risk for schizophrenia and better language performance are overrepresented in individuals involved in creative professions (n = 2953) compared to the general population (n = 164,622). The findings together suggest that language ability, creativity and psychopathology might stem from overlapping genetic roots.
DOI: 10.1101/224774
2017
Cited 60 times
Common risk variants identified in autism spectrum disorder
Abstract Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 ASD cases and 27,969 controls that identifies five genome-wide significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), seven additional loci shared with other traits are identified at equally strict significance levels. Dissecting the polygenic architecture we find both quantitative and qualitative polygenic heterogeneity across ASD subtypes, in contrast to what is typically seen in other complex disorders. These results highlight biological insights, particularly relating to neuronal function and corticogenesis and establish that GWAS performed at scale will be much more productive in the near term in ASD, just as it has been in a broad range of important psychiatric and diverse medical phenotypes.
DOI: 10.2337/db16-1329
2017
Cited 48 times
A Low-Frequency Inactivating <i>AKT2</i> Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk
To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2.
DOI: 10.1016/j.xgen.2023.100356
2023
Cited 6 times
Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions
While germline copy-number variants (CNVs) contribute to schizophrenia (SCZ) risk, the contribution of somatic CNVs (sCNVs)-present in some but not all cells-remains unknown. We identified sCNVs using blood-derived genotype arrays from 12,834 SCZ cases and 11,648 controls, filtering sCNVs at loci recurrently mutated in clonal blood disorders. Likely early-developmental sCNVs were more common in cases (0.91%) than controls (0.51%, p = 2.68e-4), with recurrent somatic deletions of exons 1-5 of the NRXN1 gene in five SCZ cases. Hi-C maps revealed ectopic, allele-specific loops forming between a potential cryptic promoter and non-coding cis-regulatory elements upon 5' deletions in NRXN1. We also observed recurrent intragenic deletions of ABCB11, encoding a transporter implicated in anti-psychotic response, in five treatment-resistant SCZ cases and showed that ABCB11 is specifically enriched in neurons forming mesocortical and mesolimbic dopaminergic projections. Our results indicate potential roles of sCNVs in SCZ risk.
DOI: 10.1038/s41588-023-01607-4
2024
Distinct and shared genetic architectures of gestational diabetes mellitus and type 2 diabetes
Abstract Gestational diabetes mellitus (GDM) is a common metabolic disorder affecting more than 16 million pregnancies annually worldwide 1,2 . GDM is related to an increased lifetime risk of type 2 diabetes (T2D) 1–3 , with over a third of women developing T2D within 15 years of their GDM diagnosis. The diseases are hypothesized to share a genetic predisposition 1–7 , but few studies have sought to uncover the genetic underpinnings of GDM. Most studies have evaluated the impact of T2D loci only 8–10 , and the three prior genome-wide association studies of GDM 11–13 have identified only five loci, limiting the power to assess to what extent variants or biological pathways are specific to GDM. We conducted the largest genome-wide association study of GDM to date in 12,332 cases and 131,109 parous female controls in the FinnGen study and identified 13 GDM-associated loci, including nine new loci. Genetic features distinct from T2D were identified both at the locus and genomic scale. Our results suggest that the genetics of GDM risk falls into the following two distinct categories: one part conventional T2D polygenic risk and one part predominantly influencing mechanisms disrupted in pregnancy. Loci with GDM-predominant effects map to genes related to islet cells, central glucose homeostasis, steroidogenesis and placental expression.
DOI: 10.1101/2021.06.19.21259117
2021
Cited 21 times
Systematic single-variant and gene-based association testing of thousands of phenotypes in 426,370 UK Biobank exomes
Abstract Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variation in human disease has not been explored at scale. Exome sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variation across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 426,370 individuals in the UK Biobank with exome sequence data. We find that the discovery of genetic associations is tightly linked to frequency as well as correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare variant association results.
DOI: 10.1038/s41398-018-0324-2
2019
Cited 24 times
Disentangling polygenic associations between attention-deficit/hyperactivity disorder, educational attainment, literacy and language
Interpreting polygenic overlap between ADHD and both literacy-related and language-related impairments is challenging as genetic associations might be influenced by indirectly shared genetic factors. Here, we investigate genetic overlap between polygenic ADHD risk and multiple literacy-related and/or language-related abilities (LRAs), as assessed in UK children (N ≤ 5919), accounting for genetically predictable educational attainment (EA). Genome-wide summary statistics on clinical ADHD and years of schooling were obtained from large consortia (N ≤ 326,041). Our findings show that ADHD-polygenic scores (ADHD-PGS) were inversely associated with LRAs in ALSPAC, most consistently with reading-related abilities, and explained ≤1.6% phenotypic variation. These polygenic links were then dissected into both ADHD effects shared with and independent of EA, using multivariable regressions (MVR). Conditional on EA, polygenic ADHD risk remained associated with multiple reading and/or spelling abilities, phonemic awareness and verbal intelligence, but not listening comprehension and non-word repetition. Using conservative ADHD-instruments (P-threshold < 5 × 10-8), this corresponded, for example, to a 0.35 SD decrease in pooled reading performance per log-odds in ADHD-liability (P = 9.2 × 10-5). Using subthreshold ADHD-instruments (P-threshold < 0.0015), these effects became smaller, with a 0.03 SD decrease per log-odds in ADHD risk (P = 1.4 × 10-6), although the predictive accuracy increased. However, polygenic ADHD-effects shared with EA were of equal strength and at least equal magnitude compared to those independent of EA, for all LRAs studied, and detectable using subthreshold instruments. Thus, ADHD-related polygenic links with LRAs are to a large extent due to shared genetic effects with EA, although there is evidence for an ADHD-specific association profile, independent of EA, that primarily involves literacy-related impairments.
DOI: 10.1038/s41598-018-28160-z
2018
Cited 18 times
Age at first birth in women is genetically associated with increased risk of schizophrenia
Previous studies have shown an increased risk for mental health problems in children born to both younger and older parents compared to children of average-aged parents. We previously used a novel design to reveal a latent mechanism of genetic association between schizophrenia and age at first birth in women (AFB). Here, we use independent data from the UK Biobank (N = 38,892) to replicate the finding of an association between predicted genetic risk of schizophrenia and AFB in women, and to estimate the genetic correlation between schizophrenia and AFB in women stratified into younger and older groups. We find evidence for an association between predicted genetic risk of schizophrenia and AFB in women (P-value = 1.12E-05), and we show genetic heterogeneity between younger and older AFB groups (P-value = 3.45E-03). The genetic correlation between schizophrenia and AFB in the younger AFB group is -0.16 (SE = 0.04) while that between schizophrenia and AFB in the older AFB group is 0.14 (SE = 0.08). Our results suggest that early, and perhaps also late, age at first birth in women is associated with increased genetic risk for schizophrenia in the UK Biobank sample. These findings contribute new insights into factors contributing to the complex bio-social risk architecture underpinning the association between parental age and offspring mental health.
DOI: 10.1038/s41398-018-0229-0
2018
Cited 17 times
Genetic risk for schizophrenia and autism, social impairment and developmental pathways to psychosis
While psychotic experiences (PEs) are assumed to represent psychosis liability, general population studies have not been able to establish significant associations between polygenic risk scores (PRS) and PEs. Previous work suggests that PEs may only represent significant risk when accompanied by social impairment. Leveraging data from the large longitudinal IMAGEN cohort, including 2096 14-year old adolescents that were followed-up to age 18, we tested whether the association between polygenic risk and PEs is mediated by (increasing) impairments in social functioning and social cognitive processes. Using structural equation modeling (SEM) for the subset of participants (n = 643) with complete baseline and follow-up data, we examined pathways to PEs. We found that high polygenic risk for schizophrenia (p = 0.014), reduced brain activity to emotional stimuli (p = 0.009) and social impairments in late adolescence (p < 0.001; controlling for functioning in early adolescence) each independently contributed to the severity of PEs at age 18. The pathway between polygenic risk for autism spectrum disorder and PEs was mediated by social impairments in late adolescence (indirect pathway; p = 0.025). These findings point to multiple direct and indirect pathways to PEs, suggesting that different processes are in play, depending on genetic loading, and environment. Our results suggest that treatments targeting prevention of social impairment may be particularly promising for individuals at genetic risk for autism in order to minimize risk for psychosis.
DOI: 10.1038/mp.2017.214
2017
Cited 17 times
Erratum: Genome-wide common and rare variant analysis provides novel insights into clozapine-associated neutropenia
Correction to: Molecular Psychiatry (2017) 22: 1502-1508; advance online publication, 12 July 2017; doi: 10.1038/mp.2016.97 In the first paragraph of the Results section and Figure 1, the authors incorrectly referred to the finding of SNP rs77897117. The correct SNP is rs77897177. The corrected figure appears in previous page.
DOI: 10.1101/2024.01.09.574205
2024
The Scalable Variant Call Representation: Enabling Genetic Analysis Beyond One Million Genomes
The Variant Call Format (VCF) is widely used in genome sequencing but scales poorly. For instance, we estimate a 150,000 genome VCF would occupy 900 TiB, making it both costly and complicated to produce and analyze. The issue stems from VCF's requirement to densely represent both reference-genotypes and allele-indexed arrays. These requirements lead to unnecessary data duplication and, ultimately, very large files. To address these challenges, we introduce the Scalable Variant Call Representation (SVCR). This representation reduces file sizes by ensuring they scale linearly with samples. SVCR achieves this by adopting reference blocks from the Genomic Variant Call Format (GVCF) and employing local allele indices. SVCR is also lossless and mergeable, allowing for N+1 and N+K incremental joint-calling. We present two implementations of SVCR: SVCR-VCF, which encodes SVCR in VCF format, and VDS, which uses Hail's native format. Our experiments confirm the linear scalability of SVCR-VCF and VDS, in contrast to the super-linear growth seen with standard VCF files. We also discuss the VDS Combiner, a scalable, open-source tool for producing a VDS from GVCFs and unique features of VDS which enable rapid data analysis. SVCR, and VDS in particular, ensure the scientific community can generate, analyze, and disseminate genetics datasets with millions of samples.
DOI: 10.1101/2024.03.13.24303864
2024
Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects
Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, individuals from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here, we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UKB than previous efforts, to produce freely-available summary statistics for 7,271 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci in the meta-analysis that were not found in the European genetic ancestry group alone, including novel associations for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside FAQs that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.
DOI: 10.1101/2024.04.11.588920
2024
The landscape of regional missense mutational intolerance quantified from 125,748 exomes
Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.
DOI: 10.1101/040493
2016
Cited 8 times
A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects
Genomic copy number variants (CNVs) have been strongly implicated in the etiology of schizophrenia (SCZ). However, apart from a small number of risk variants, elucidation of the CNV contribution to risk has been difficult due to the rarity of risk alleles, all occurring in less than 1% of cases. We sought to address this obstacle through a collaborative effort in which we applied a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. We observed a global enrichment of CNV burden in cases (OR=1.11, P=5.7e-15), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7e-6). CNV burden is also enriched for genes associated with synaptic function (OR = 1.68, P = 2.8e-11) and neurobehavioral phenotypes in mouse (OR = 1.18, P= 7.3e-5). We identified genome-wide significant support for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. We find support at a suggestive level for nine additional candidate susceptibility and protective loci, which consist predominantly of CNVs mediated by non-allelic homologous recombination (NAHR).
DOI: 10.1101/146670
2017
Cited 6 times
The iPSYCH2012 case-cohort sample: New directions for unravelling genetic and environmental architectures of severe mental disorders
The iPSYCH consortium has established a large Danish population-based Case-Cohort sample (iPSYCH2012) aimed at unravelling the genetic and environmental architecture of severe mental disorders. The iPSYCH2012 sample is nested within the entire Danish population born 1981-2005 including 1,472,762 persons. This paper introduces the iPSYCH2012 sample and outlines key future research directions. Cases were identified as persons with schizophrenia (N=3,540), autism (N=16,146), ADHD (N=18,726), and affective disorder (N=26,380), of which 1928 had bipolar affective disorder. Controls were randomly sampled individuals (N=30,000). Within the sample of 86,189 individuals, a total of 57,377 individuals had at least one major mental disorder. DNA was extracted from the neonatal dried blood spot samples obtained from the Danish Neonatal Screening Biobank and genotyped using the Illumina PsychChip. Genotyping was successful for 90% of the sample. The assessments of exome sequencing, methylation profiling, metabolome profiling, vitamin-D, inflammatory and neurotrophic factors are in progress. For each individual, the iPSYCH2012 sample also includes longitudinal information on health, prescribed medicine, social and socioeconomic information and analogous information among relatives. To the best of our knowledge, the iPSYCH2012 sample is the largest and most comprehensive data source for the combined study of genetic and environmental aetiologies of severe mental disorders.
DOI: 10.1101/068999
2016
Cited 3 times
Bootstrat: Population Informed Bootstrapping for Rare Variant Tests
Abstract Recent advances in genotyping and sequencing technologies have made detecting rare variants in large cohorts possible. Various analytic methods for associating disease to rare variants have been proposed, including burden tests, C-alpha and SKAT. Most of these methods, however, assume that samples come from a homogeneous population, which is not realistic for analyses of large samples. Not correcting for population stratification causes inflated p -values and false-positive associations. Here we propose a population-informed bootstrap resampling method that controls for population stratification (Bootstrat) in rare variant tests. In essence, the Bootstrat procedure uses genetic distance to create a phenotype probability for each sample. We show that this empirical approach can effectively correct for population stratification while maintaining statistical power comparable to established methods of controlling for population stratification. The Bootstrat scheme can be easily applied to existing rare variant testing methods with reasonable computational complexity. Author Summary Recent technology advances have enabled large-scale analysis of rare variants, but properly testing rare variants remains a significant challenge as most rare variant testing methods assume a sample of homogenous ethnicity, an assumption often not true for large cohorts. Failure to account for this heterogeneity increases the type I error rate. Here we propose a bootstrap scheme applicable to most existing rare variant testing methods to control for population heterogeneity. This scheme uses a randomization layer to establish a null distribution of the test statistics while preserving the sample genetic relationships. The null distribution is then used to calculate an empirical p -value that accounts for population heterogeneity. We demonstrate how this scheme successfully controls the type I error rate without loss of statistical power.
DOI: 10.1101/089342
2016
Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders
Autism spectrum disorder (ASD) risk is influenced by both common polygenic and de novo variation. The purpose of this analysis was to clarify the influence of common polygenic risk for ASDs and to identify subgroups of cases, including those with strong acting de novo variants, in which different types of polygenic risk are relevant. To do so, we extend the transmission disequilibrium approach to encompass polygenic risk scores, and introduce with polygenic transmission disequilibrium test. Using data from more than 6,400 children with ASDs and 15,000 of their family members, we show that polygenic risk for ASDs, schizophrenia, and educational attainment is over transmitted to children with ASDs in two independent samples, but not to their unaffected siblings. These findings hold independent of proband IQ. We find that common polygenic variation contributes additively to ASD risk in cases that carry a very strong acting de novo variant. Lastly, we find evidence that elements of polygenic risk are independent and differ in their relationship with proband phenotype. These results confirm that ASDs9 genetic influences are highly additive and suggest that they create risk through at least partially distinct etiologic pathways.
DOI: 10.1007/978-1-4939-2824-8_1
2015
Calling Rare Variants from Genotype Data
DOI: 10.1101/050195
2016
Ultra-rare disruptive and damaging mutations influence educational attainment in the general population
Ultra-rare inherited and de novo disruptive variants in highly constrained (HC) genes are enriched in neurodevelopmental disorders 1–5 . However, their impact on cognition in the general population has not been explored. We hypothesize that disruptive and damaging ultra-rare variants (URVs) in HC genes not only confer risk to neurodevelopmental disorders, but also influence general cognitive abilities measured indirectly by years of education (YOE). We tested this hypothesis in 14,133 individuals with whole exome or genome sequencing data. The presence of one or more URVs was associated with a decrease in YOE (3.1 months less for each additional mutation; P-value=3.3×10 −8 ) and the effect was stronger in HC genes enriched for brain expression (6.5 months less, P-value=3.4×10 −5 ). The effect of these variants was more pronounced than the estimated effects of runs of homozygosity and pathogenic copy number variation 6–9 . Our findings suggest that effects of URVs in HC genes are not confined to severe neurodevelopmental disorder, but influence the cognitive spectrum in the general population
2016
A contribution of novel CNVs to schizophrenia from a genome-wide study of 41,321 subjects: CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium
Abstract Genomic copy number variants (CNVs) have been strongly implicated in the etiology of schizophrenia (SCZ). However, apart from a small number of risk variants, elucidation of the CNV contribution to risk has been difficult due to the rarity of risk alleles, all occurring in less than 1% of cases. We sought to address this obstacle through a collaborative effort in which we applied a centralized analysis pipeline to a SCZ cohort of 21,094 cases and 20,227 controls. We observed a global enrichment of CNV burden in cases (OR=1.11, P=5.7e −15 ), which persisted after excluding loci implicated in previous studies (OR=1.07, P=1.7e −6 ). CNV burden is also enriched for genes associated with synaptic function (OR = 1.68, P = 2.8e −11 ) and neurobehavioral phenotypes in mouse (OR = 1.18, P=7.3e −5 ). We identified genome-wide significant support for eight loci, including 1q21.1, 2p16.3 (NRXN1), 3q29, 7q11.2, 15q13.3, distal 16p11.2, proximal 16p11.2 and 22q11.2. We find support at a suggestive level for nine additional candidate susceptibility and protective loci, which consist predominantly of CNVs mediated by non-allelic homologous recombination (NAHR).
2016
Ultra-rare disruptive and damaging mutations influence educational attainment in the general population
DOI: 10.17615/06sq-3t61
2016
Ultra-rare disruptive and damaging mutations influence educational attainment in the general population
DOI: 10.17615/wz06-t411
2014
Clozapine-induced agranulocytosis is associated with rare HLA-DQB1 and HLA-B alleles
DOI: 10.17615/4xr8-gp35
2016
Genetic influences on schizophrenia and subcortical brain volumes: large-scale proof of concept
DOI: 10.17615/2r6r-p404
2012
zCall: a rare variant caller for array-based genotyping
2017
Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects
2017
Evaluating Hypertrophic Cardiomyopathy Disease-Gene Associations Using the Clinical Genome Resource (ClinGen) Gene Curation Framework
DOI: 10.17615/gg0m-d074
2017
Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders
DOI: 10.17615/hs58-7865
2017
A Low-Frequency Inactivating AKT2 Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk
DOI: 10.17615/m6qv-yz19
2017
Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects