ϟ

Vasily Ramensky

Here are all the papers by Vasily Ramensky that you can download and read on OA.mg.
Vasily Ramensky’s last known institution is . Download Vasily Ramensky PDFs here.

Claim this Profile →
DOI: 10.1038/nmeth0410-248
2010
Cited 11,355 times
A method and server for predicting damaging missense mutations
To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.
DOI: 10.1093/nar/gkf493
2002
Cited 2,135 times
Human non-synonymous SNPs: server and survey
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.
DOI: 10.1038/s41586-019-1457-z
2019
Cited 161 times
Exome sequencing of Finnish isolates enhances rare-variant association power
Exome-sequencing studies have generally been underpowered to identify deleterious alleles with a large effect on complex traits as such alleles are mostly rare. Because the population of northern and eastern Finland has expanded considerably and in isolation following a series of bottlenecks, individuals of these populations have numerous deleterious alleles at a relatively high frequency. Here, using exome sequencing of nearly 20,000 individuals from these regions, we investigate the role of rare coding variants in clinically relevant quantitative cardiometabolic traits. Exome-wide association studies for 64 quantitative traits identified 26 newly associated deleterious alleles. Of these 26 alleles, 19 are either unique to or more than 20 times more frequent in Finnish individuals than in other Europeans and show geographical clustering comparable to Mendelian disease mutations that are characteristic of the Finnish population. We estimate that sequencing studies of populations without this unique history would require hundreds of thousands to millions of participants to achieve comparable association power. Exome-wide sequencing studies of populations in Finland identified 26 deleterious alleles associated with 64 quantitative traits that are clinically relevant to cardiovascular and metabolic diseases.
DOI: 10.1016/j.neuron.2017.06.010
2017
Cited 140 times
Rare Copy Number Variants in NRXN1 and CNTN6 Increase Risk for Tourette Syndrome
Tourette syndrome (TS) is a model neuropsychiatric disorder thought to arise from abnormal development and/or maintenance of cortico-striato-thalamo-cortical circuits. TS is highly heritable, but its underlying genetic causes are still elusive, and no genome-wide significant loci have been discovered to date. We analyzed a European ancestry sample of 2,434 TS cases and 4,093 ancestry-matched controls for rare (< 1% frequency) copy-number variants (CNVs) using SNP microarray data. We observed an enrichment of global CNV burden that was prominent for large (> 1 Mb), singleton events (OR = 2.28, 95% CI [1.39–3.79], p = 1.2 × 10−3) and known, pathogenic CNVs (OR = 3.03 [1.85–5.07], p = 1.5 × 10−5). We also identified two individual, genome-wide significant loci, each conferring a substantial increase in TS risk (NRXN1 deletions, OR = 20.3, 95% CI [2.6–156.2]; CNTN6 duplications, OR = 10.1, 95% CI [2.3–45.4]). Approximately 1% of TS cases carry one of these CNVs, indicating that rare structural variation contributes significantly to the genetic architecture of TS.
DOI: 10.1016/s0168-9525(00)01988-0
2000
Cited 329 times
Towards a structural basis of human non-synonymous single nucleotide polymorphisms
About 90% of human genetic variation has been ascribed to single nucleotide polymorphism (SNP) allelic variants that occur at a frequency of >1% ( 1 Collins F.S. et al. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998; 8: 1229-1231 Crossref PubMed Scopus (658) Google Scholar ). Owing to the application of high-throughput SNP detection techniques, the number of identified SNPs is growing rapidly, enabling detailed statistical studies 2 Halushka M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 1999; 22: 239-247 Crossref PubMed Scopus (895) Google Scholar , 3 Cargill M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 1999; 22: 231-238 Crossref PubMed Scopus (1589) Google Scholar , 4 Buetow K.H. et al. Reliable identification of large numbers of candidate SNPs from public EST data. Nat. Genet. 1999; 21: 323-325 Crossref PubMed Scopus (224) Google Scholar , 5 Sunyaev S.R. et al. Individual variation in protein-coding sequences of the human genome. Adv. Protein Chem. 2000; (in press) PubMed Google Scholar . These include studies of SNPs that affect the amino acid sequence of a gene product (non-synonymous SNPs); they complement the large body of literature on mutations that cause mendelian diseases, which represent the usually rare non-synonymous mutations with an allele frequency far below one percent 3 Cargill M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 1999; 22: 231-238 Crossref PubMed Scopus (1589) Google Scholar .
DOI: 10.1101/gr.192922.115
2015
Cited 118 times
The genome of the vervet (<i>Chlorocebus aethiops sabaeus</i>)
We describe a genome reference of the African green monkey or vervet (Chlorocebus aethiops). This member of the Old World monkey (OWM) superfamily is uniquely valuable for genetic investigations of simian immunodeficiency virus (SIV), for which it is the most abundant natural host species, and of a wide range of health-related phenotypes assessed in Caribbean vervets (C. a. sabaeus), whose numbers have expanded dramatically since Europeans introduced small numbers of their ancestors from West Africa during the colonial era. We use the reference to characterize the genomic relationship between vervets and other primates, the intra-generic phylogeny of vervet subspecies, and genome-wide structural variations of a pedigreed C. a. sabaeus population. Through comparative analyses with human and rhesus macaque, we characterize at high resolution the unique chromosomal fission events that differentiate the vervets and their close relatives from most other catarrhine primates, in whom karyotype is highly conserved. We also provide a summary of transposable elements and contrast these with the rhesus macaque and human. Analysis of sequenced genomes representing each of the main vervet subspecies supports previously hypothesized relationships between these populations, which range across most of sub-Saharan Africa, while uncovering high levels of genetic diversity within each. Sequence-based analyses of major histocompatibility complex (MHC) polymorphisms reveal extremely low diversity in Caribbean C. a. sabaeus vervets, compared to vervets from putatively ancestral West African regions. In the C. a. sabaeus research population, we discover the first structural variations that are, in some cases, predicted to have a deleterious effect; future studies will determine the phenotypic impact of these variations.
DOI: 10.1038/ng.3980
2017
Cited 115 times
Ancient hybridization and strong adaptation to viruses across African vervet monkey populations
Vervet monkeys are among the most widely distributed nonhuman primates, show considerable phenotypic diversity, and have long been an important biomedical model for a variety of human diseases and in vaccine research. Using whole-genome sequencing data from 163 vervets sampled from across Africa and the Caribbean, we find high diversity within and between taxa and clear evidence that taxonomic divergence was reticulate rather than following a simple branching pattern. A scan for diversifying selection across taxa identifies strong and highly polygenic selection signals affecting viral processes. Furthermore, selection scores are elevated in genes whose human orthologs interact with HIV and in genes that show a response to experimental simian immunodeficiency virus (SIV) infection in vervet monkeys but not in rhesus macaques, suggesting that part of the signal reflects taxon-specific adaptation to SIV.
DOI: 10.1038/ng.3959
2017
Cited 61 times
Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate
By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.
DOI: 10.1016/j.ajhg.2018.09.013
2018
Cited 49 times
Understanding the Hidden Complexity of Latin American Population Isolates
Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole-genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 36×. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals is significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we find that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests that there is no single genetic signature of a population isolate.
DOI: 10.3390/genes12010066
2021
Cited 33 times
The LDLR, APOB, and PCSK9 Variants of Index Patients with Familial Hypercholesterolemia in Russia
Familial hypercholesterolemia (FH) is a common autosomal codominant disorder, characterized by elevated low-density lipoprotein cholesterol levels causing premature atherosclerotic cardiovascular disease. About 2900 variants of LDLR, APOB, and PCSK9 genes potentially associated with FH have been described earlier. Nevertheless, the genetics of FH in a Russian population is poorly understood. The aim of this study is to present data on the spectrum of LDLR, APOB, and PCSK9 gene variants in a cohort of 595 index Russian patients with FH, as well as an additional systematic analysis of the literature for the period of 1995–2020 on LDLR, APOB and PCSK9 gene variants described in Russian patients with FH. We used targeted and whole genome sequencing to search for variants. Accordingly, when combining our novel data and the data of a systematic literature review, we described 224 variants: 187 variants in LDLR, 14 variants in APOB, and 23 variants in PCSK9. A significant proportion of variants, 81 of 224 (36.1%), were not described earlier in FH patients in other populations and may be specific for Russia. Thus, this study significantly supplements knowledge about the spectrum of variants causing FH in Russia and may contribute to a wider implementation of genetic diagnostics in FH patients in Russia.
DOI: 10.1016/j.sbi.2010.03.006
2010
Cited 66 times
Human allelic variation: perspective from protein function, structure, and evolution
It is widely anticipated that the coming year will be marked by the complete characterization of DNA sequence of protein-coding regions of thousands of human individuals. A number of existing computational methods use comparative protein sequence analysis and analysis of protein structure to predict the functional effect of coding human alleles. Functional and structural analysis of coding allelic variants can inform various aspects of research on human genetic variation. In population and evolutionary genetics it helps estimate the strength of purifying selection against deleterious missense mutations and study the imprint of demographic history on deleterious genetic variation. In medical genetics it may assist in the interpretation of uncharacterized mutations in genes involved in monogenic and oligogenic diseases. It has a potential to facilitate medical sequencing studies searching for genes underlying Mendelian diseases or harboring rare alleles involved in complex traits.
DOI: 10.1371/journal.pgen.1004147
2014
Cited 52 times
Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
DOI: 10.1186/s12915-015-0152-2
2015
Cited 48 times
Sequencing strategies and characterization of 721 vervet monkey genomes for future genetic analyses of medically relevant traits
We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.
DOI: 10.1186/s12864-018-5387-1
2019
Cited 34 times
CpG traffic lights are markers of regulatory regions in human genome
DNA methylation is involved in the regulation of gene expression. Although bisulfite-sequencing based methods profile DNA methylation at a single CpG resolution, methylation levels are usually averaged over genomic regions in the downstream bioinformatic analysis. We demonstrate that on the genome level a single CpG methylation can serve as a more accurate predictor of gene expression than an average promoter / gene body methylation. We define CpG traffic lights (CpG TL) as CpG dinucleotides with a significant correlation between methylation and expression of a gene nearby. CpG TL are enriched in all regulatory regions. Among all promoters, CpG TL are especially enriched in poised ones, suggesting involvement of DNA methylation in their regulation. Yet, binding of only a handful of transcription factors, such as NRF1, ETS, STAT and IRF-family members, could be regulated by direct methylation of transcription factor binding sites (TFBS) or its close proximity. For the majority of TF, an alternative scenario is more likely: methylation and inactivation of the whole regulatory element indirectly represses functional TF binding with a CpG TL being a reliable marker of such inactivation. CpG TL provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to gene expression. CpG TL methylation can be used as reliable markers of enhancer activity and gene expression in applications, e.g. in clinic where measuring DNA methylation is easier compared to directly measuring gene expression due to more stable nature of DNA.
DOI: 10.1016/s0168-9525(00)02058-8
2000
Cited 81 times
SNP frequencies in human genes
The origin and maintenance of DNA polymorphism in populations has been a subject of debate and conjecture for several decades. Strictly selectionist arguments suggest that polymorphisms must be maintained solely as a result of balancing selection, whereas neutral mutation theory states that polymorphism is maintained via mutation and stochastic mechanisms. The cause of polymorphisms is actually probably more complex and, although some small-scale studies for human polymorphism have been reported 1 Cargill M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 1999; 22: 231-238 Crossref PubMed Scopus (1589) Google Scholar , 2 Halushka M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 1999; 22: 239-247 Crossref PubMed Scopus (895) Google Scholar , 3 Li W.H. Sadler L.A. Low nucleotide diversity in man. Genetics. 1991; 129: 513-523 Crossref PubMed Google Scholar , no large-scale analyses have yet been done.
DOI: 10.1093/nar/gkw364
2016
Cited 32 times
StructMAn: annotation of single-nucleotide polymorphisms in the structural context
The next generation sequencing technologies produce unprecedented amounts of data on the genetic sequence of individual organisms. These sequences carry a substantial amount of variation that may or may be not related to a phenotype. Phenotypically important part of this variation often comes in form of protein-sequence altering (non-synonymous) single nucleotide variants (nsSNVs). Here we present StructMAn, a Web-based tool for annotation of human and non-human nsSNVs in the structural context. StructMAn analyzes the spatial location of the amino acid residue corresponding to nsSNVs in the three-dimensional (3D) protein structure relative to other proteins, nucleic acids and low molecular-weight ligands. We make use of all experimentally available 3D structures of query proteins, and also, unlike other tools in the field, of structures of proteins with detectable sequence identity to them. This allows us to provide a structural context for around 20% of all nsSNVs in a typical human sequencing sample, for up to 60% of nsSNVs in genes related to human diseases and for around 35% of nsSNVs in a typical bacterial sample. Each nsSNV can be visualized and inspected by the user in the corresponding 3D structure of a protein or protein complex. The StructMAn server is available at http://structman.mpi-inf.mpg.de.
DOI: 10.1038/s41398-020-0758-1
2020
Cited 26 times
Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates
Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 838 individuals and high-coverage whole-genome sequencing of 449 individuals. We compared polygenic risk scores (PRS), estimated using the latest BP1 genome-wide association study (GWAS) summary statistics, between BP1 individuals and related controls. We also evaluated whether BP1 individuals had a higher burden of rare deleterious single-nucleotide variants (SNVs) and rare copy number variants (CNVs) in a set of genes related to BP1. We found that compared with unaffected relatives, BP1 individuals had higher PRS estimated from BP1 GWAS statistics (P = 0.001 ~ 0.007) and displayed modest increase in burdens of rare deleterious SNVs (P = 0.047) and rare CNVs (P = 0.002 ~ 0.033) in genes related to BP1. We did not observe rare variants segregating in the pedigrees. These results suggest that small-to-moderate effect rare and common variants are more likely to contribute to BP1 risk in these extended pedigrees than a few large-effect rare variants.
DOI: 10.3389/fgene.2021.709419
2021
Cited 20 times
Targeted Sequencing of 242 Clinically Important Genes in the Russian Population From the Ivanovo Region
We performed a targeted sequencing of 242 clinically important genes mostly associated with cardiovascular diseases in a representative population sample of 1,658 individuals from the Ivanovo region northeast of Moscow. Approximately 11% of 11,876 detected variants were not found in the Single Nucleotide Polymorphism Database (dbSNP) or reported earlier in the Russian population. Most novel variants were singletons and doubletons in our sample, and virtually no novel alleles presumably specific for the Russian population were able to reach the frequencies above 0.1-0.2%. The overwhelming majority (99.3%) of variants detected in this study in three or more copies were shared with other populations. We found two dominant and seven recessive known pathogenic variants with allele frequencies significantly increased compared to those in the gnomAD non-Finnish Europeans. Of the 242 targeted genes, 28 were in the list of 59 genes for which the American College of Medical Genetics and Genomics (ACMG) recommended the reporting of incidental findings. Based on the number of variants detected in the sequenced subset of ACMG59 genes, we approximated the prevalence of known pathogenic and novel or rare protein-truncating variants in the complete set of ACMG59 genes in the Ivanovo population at 1.4 and 2.8%, respectively. We analyzed the available clinical data and observed the incomplete penetrance of known pathogenic variants in the 28 ACMG59 genes: only 1 individual out of 12 with such variants had the phenotype most likely related to the variant. When known pathogenic and novel or rare protein-truncating variants were considered together, the overall rate of confirmed phenotypes was about 19%, with maximum in the subset of novel protein-truncating variants. We report three novel protein truncating variants in APOB and one in MYH7 observed in individuals with hypobetalipoproteinemia and hypertrophic cardiomyopathy, respectively. Our results provide a valuable reference for the clinical interpretation of gene sequencing in Russian and other populations.
DOI: 10.3389/fcvm.2023.1205787
2023
Cited 4 times
Genetic landscape in Russian patients with familial left ventricular noncompaction
Left ventricular noncompaction (LVNC) cardiomyopathy is a disorder that can be complicated by heart failure, arrhythmias, thromboembolism, and sudden cardiac death. The aim of this study is to clarify the genetic landscape of LVNC in a large cohort of well-phenotyped Russian patients with LVNC, including 48 families (n=214).All index patients underwent clinical examination and genetic analysis, as well as family members who agreed to participate in the clinical study and/or in the genetic testing. The genetic testing included next generation sequencing and genetic classification according to ACMG guidelines.A total of 55 alleles of 54 pathogenic and likely pathogenic variants in 24 genes were identified, with the largest number in the MYH7 and TTN genes. A significant proportion of variants -8 of 54 (14.8%) -have not been described earlier in other populations and may be specific to LVNC patients in Russia. In LVNC patients, the presence of each subsequent variant is associated with increased odds of having more severe LVNC subtypes than isolated LVNC with preserved ejection fraction. The corresponding odds ratio is 2.77 (1.37 -7.37; p <0.001) per variant after adjustment for sex, age, and family.Overall, the genetic analysis of LVNC patients, accompanied by cardiomyopathy-related family history analysis, resulted in a high diagnostic yield of 89.6%. These results suggest that genetic screening should be applied to the diagnosis and prognosis of LVNC patients.
DOI: 10.1371/journal.pone.0235106
2020
Cited 21 times
ACE2 and TMPRSS2 variation in savanna monkeys (Chlorocebus spp.): Potential risk for zoonotic/anthroponotic transmission of SARS-CoV-2 and a potential model for functional studies
The COVID-19 pandemic, caused by the coronavirus SARS-CoV-2, has devastated health infrastructure around the world. Both ACE2 (an entry receptor) and TMPRSS2 (used by the virus for spike protein priming) are key proteins to SARS-CoV-2 cell entry, enabling progression to COVID-19 in humans. Comparative genomic research into critical ACE2 binding sites, associated with the spike receptor binding domain, has suggested that African and Asian primates may also be susceptible to disease from SARS-CoV-2 infection. Savanna monkeys (Chlorocebus spp.) are a widespread non-human primate with well-established potential as a bi-directional zoonotic/anthroponotic agent due to high levels of human interaction throughout their range in sub-Saharan Africa and the Caribbean. To characterize potential functional variation in savanna monkey ACE2 and TMPRSS2, we inspected recently published genomic data from 245 savanna monkeys, including 163 wild monkeys from Africa and the Caribbean and 82 captive monkeys from the Vervet Research Colony (VRC). We found several missense variants. One missense variant in ACE2 (X:14,077,550; Asp30Gly), common in Ch. sabaeus, causes a change in amino acid residue that has been inferred to reduce binding efficiency of SARS-CoV-2, suggesting potentially reduced susceptibility. The remaining populations appear as susceptible as humans, based on these criteria for receptor usage. All missense variants observed in wild Ch. sabaeus populations are also present in the VRC, along with two splice acceptor variants (at X:14,065,076) not observed in the wild sample that are potentially disruptive to ACE2 function. The presence of these variants in the VRC suggests a promising model for SARS-CoV-2 infection and vaccine and therapy development. In keeping with a One Health approach, characterizing actual susceptibility and potential for bi-directional zoonotic/anthroponotic transfer in savanna monkey populations may be an important consideration for controlling COVID-19 epidemics in communities with frequent human/non-human primate interactions that, in many cases, may have limited health infrastructure.
DOI: 10.1128/jb.00999-08
2009
Cited 36 times
Maturation of the Translation Inhibitor Microcin C
ABSTRACT Microcin C (McC), an inhibitor of the growth of enteric bacteria, consists of a heptapeptide with a modified AMP residue attached to the backbone of the C-terminal aspartate through an N -acyl phosphamidate bond. Here we identify maturation intermediates produced by cells lacking individual mcc McC biosynthesis genes. We show that the products of the mccD and mccE genes are required for attachment of a 3-aminopropyl group to the phosphate of McC and that this group increases the potency of inhibition of the McC target, aspartyl-tRNA synthetase.
DOI: 10.1186/1745-6150-5-54
2010
Cited 29 times
Asymmetric and non-uniform evolution of recently duplicated human genes
Gene duplications are a source of new genes and protein functions. The innovative role of duplication events makes families of paralogous genes an interesting target for studies in evolutionary biology. Here we study global trends in the evolution of human genes that resulted from recent duplications.The pressure of negative selection is weaker during a short time immediately after a duplication event. Roughly one fifth of genes in paralogous gene families are evolving asymmetrically: one of the proteins encoded by two closest paralogs accumulates amino acid substitutions significantly faster than its partner. This asymmetry cannot be explained by differences in gene expression levels. In asymmetric gene pairs the number of deleterious mutations is increased in one copy, while decreased in the other copy as compared to genes constituting non-asymmetrically evolving pairs. The asymmetry in the rate of synonymous substitutions is much weaker and not significant.The increase of negative selection pressure over time after a duplication event seems to be a major trend in the evolution of human paralogous gene families. The observed asymmetry in the evolution of paralogous genes shows that in many cases one of two gene copies remains practically unchanged, while the other accumulates functional mutations. This supports the hypothesis that slowly evolving gene copies preserve their original functions, while fast evolving copies obtain new specificities or functions.
DOI: 10.1038/oncsis.2017.79
2017
Cited 21 times
Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes
Next-generation sequencing enables simultaneous analysis of hundreds of human genomes associated with a particular phenotype, for example, a disease. These genomes naturally contain a lot of sequence variation that ranges from single-nucleotide variants (SNVs) to large-scale structural rearrangements. In order to establish a functional connection between genotype and disease-associated phenotypes, one needs to distinguish disease drivers from neutral passenger variants. Functional annotation based on experimental assays is feasible only for a limited number of candidate mutations. Thus alternative computational tools are needed. A possible approach to annotating mutations functionally is to consider their spatial location relative to functionally relevant sites in three-dimensional (3D) structures of the harboring proteins. This is impeded by the lack of available protein 3D structures. Complementing experimentally resolved structures with reliable computational models is an attractive alternative. We developed a structure-based approach to characterizing comprehensive sets of non-synonymous single-nucleotide variants (nsSNVs): associated with cancer, non-cancer diseases and putatively functionally neutral. We searched experimentally resolved protein 3D structures for potential homology-modeling templates for proteins harboring corresponding mutations. We found such templates for all proteins with disease-associated nsSNVs, and 51 and 66% of proteins carrying common polymorphisms and annotated benign variants. Many mutations caused by nsSNVs can be found in protein-protein, protein-nucleic acid or protein-ligand complexes. Correction for the number of available templates per protein reveals that protein-protein interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated with non-cancer diseases. Whereas cancer-associated mutations are enriched in DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces. In contrast, mutations associated with non-cancer diseases are in general rare in DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins. All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and nsSNVs associated with non-cancer diseases are additionally enriched in protein core, where they probably affect overall protein stability.
DOI: 10.1089/10665270050081487
2000
Cited 42 times
DNA Segmentation Through the Bayesian Approach
We present a new approach to DNA segmentation into compositionally homogeneous blocks. The Bayesian estimator, which is applicable for both short and long segments, is used to obtain the measure of homogeneity. An exact optimal segmentation is found via the dynamic programming technique. After completion of the segmentation procedure, the sequence composition on different scales can be analyzed with filtration of boundaries via the partition function approach.
DOI: 10.1093/hmg/ddg359
2003
Cited 33 times
Impact of selection, mutation rate and genetic drift on human genetic variation
The accumulation of genome-wide information on single nucleotide polymorphisms in humans provides an unprecedented opportunity to detect the evolutionary forces responsible for heterogeneity of the level of genetic variability across loci. Previous studies have shown that history of recombination events has produced long haplotype blocks in the human genome, which contribute to this heterogeneity. Other factors, however, such as natural selection or the heterogeneity of mutation rates across loci, may also lead to heterogeneity of genetic variability. We compared synonymous and non-synonymous variability within human genes with their divergence from murine orthologs. We separately analyzed the non-synonymous variants predicted to damage protein structure or function and the variants predicted to be functionally benign. The predictions were based on comparative sequence analysis and, in some cases, on the analysis of protein structure. A strong correlation between non-synonymous, benign variability and non-synonymous human–mouse divergence suggests that selection played an important role in shaping the pattern of variability in coding regions of human genes. However, the lack of correlation between deleterious variability and evolutionary divergence shows that a substantial proportion of the observed non-synonymous single-nucleotide polymorphisms reduces fitness and never reaches fixation. Evolutionary and medical implications of the impact of selection on human polymorphisms are discussed.
DOI: 10.1016/j.ajhg.2008.05.017
2008
Cited 25 times
Positive Selection in Alternatively Spliced Exons of Human Genes
Alternative splicing is a well-recognized mechanism of accelerated genome evolution. We have studied single-nucleotide polymorphisms and human-chimpanzee divergence in the exons of 6672 alternatively spliced human genes, with the aim of understanding the forces driving the evolution of alternatively spliced sequences. Here, we show that alternatively spliced exons and exon fragments (alternative exons) from minor isoforms experience lower selective pressure at the amino acid level, accompanied by selection against synonymous sequence variation. The results of the McDonald-Kreitman test suggest that alternatively spliced exons, unlike exons constitutively included in the mRNA, are also subject to positive selection, with up to 27% of amino acids fixed by positive selection.
DOI: 10.15829/1728-8800-2022-3464
2023
Assessment of polygenic risk of hypertension
Hypertension (HTN) is a leading risk factor for the development of cardiovascular diseases. In recent decades, the rapid development of genetic tests, in particular genome-wide association study (GWAS), has made it possible to identify hundreds of nucleotide sequence variants associated with the development of HTN. One approach to improve the predictive power of genetic testing is to combine information about many nucleotide sequence variants into a single risk assessment system, often referred to as a genetic risk score. Within the framework of this review, the most significant publications on the study of the genetic risk score for HTN will be considered, and the features of their development and application will be discussed.
DOI: 10.15829/1728-8800-2023-3801
2024
Validation of genetic risk scores for hypertension in the Central Russian population
Aim. To validate and evaluate the accuracy of 4 genetic risk scores (GRSs) for hypertension (HTN), previously created on European samples, on a population sample of the Ivanovo Oblast. Material and methods . For genetic analysis, targeted next-generation sequencing was used on a sample of the Central Russia (n=1682) based on the biobank collection. Four GRSs associated with HTN, previously developed for the European population, were selected for validation. The coefficient of determination and the area under the ROC curve were used as quality metrics for regression models. Additional validation was carried out to include all nucleotide sequence variants, regardless of linkage disequilibrium level. A combined GRS was compiled based on coefficients from individual GRSs using the clumping + thresholding (C+T) method. Results. The study demonstrated that the predictive value of previously developed GRSs when used for Central Russian population is lower than in the original studies. The proportion of explained variance was 0,5-0,8%. The best predictive ability (proportion of explained variance — 2,5%) was demonstrated using previously developed GRSs (Evangelou E, et al., 2018), which includes the largest number of nucleotide sequence variants (n=852). Conclusion. GRSs for HTN, developed on European samples, is not recommended for Russian population without preliminary validation. To create original GRSs, combining statistical parameters (β-coefficients and p-value) from different GRS is not recommended.
DOI: 10.15829/1728-8800-2023-3871
2024
Search and replication of associations of genome variants with lipid levels in a Russian sample
Aim . To search associations for lipid profile parameters (lowand highdensity lipoprotein cholesterol levels, triglycerides and total cholesterol) in population samples from two Russian regions and make a replication analysis of a previously published genome-wide association study (GWA study, GWAS) for residents of three other Russian regions. Material and methods . The study included representative samples from the Vologda (n=689) and Ivanovo (n=1675) regions collected for the Epidemiology of Cardiovascular Diseases and their Risk Factors in Regions of Russian Federation (ESSE-RF) study. We assessed lipid profile parameters and performed a targeted sequencing. A linear regression model adjusted for sex, age, and statin use was used to assess the associations of genomic variants with lipid profiles. The work replicated the results of a study by Usoltsev D, et al., 2023, carried out on population samples of individuals from St.Petersburg, Orenburg and Samara regions. Results. We identified variants for which associations with lipid parameters had previously been identified in a Russian sample. The proportion of replicated variants was 89% and 92% for the samples from the Vologda and Ivanovo regions, respectively. The directions of effects of all replicated variants in the previously published study (samples from the Orenburg and Samara regions and St. Petersburg) and in both studied samples (samples from the Ivanovo and Vologda regions) coincide. Conclusion. The results of the search for associations with lipid parameters in different Russian samples are consistent with each other.
DOI: 10.15829/1728-8800-2023-3856
2024
Validation of genetic risk scores for coronary artery disease, developed on European population samples, in Russian population
Aim. To evaluate the information content of genetic risk scores (GRSs) for coronary artery disease (CAD), previously developed on European populations, in representatives of the Russian population. Material and methods . The work involved 1685 people from the ESSE-Ivanovo epidemiological study. CAD was verified in 3,1% of individuals. The coronary composite endpoint was assessed annually during 8-year follow-up. Next generation sequencing was performed using a targeted panel. Logistic regression analysis and area under the ROC curve (AUC) were used. Age, sex, and smoking status were taken into account in the multivariate model. Results. Of the 16 GRSs included in the analysis, only 2 GRSs demonstrated significance in the univariate analysis of association with CAD (highest AUC — 0,577). In a multivariate model, with an increase by 1 standard deviation (SD) for the 6 studied GRSs, a significant association with CAD was obtained — the odds ratio varied in the range of 1,31-1,47. The two GRSs demonstrated significant differences in the incidence of CAD between the groups corresponding to the upper and lower quintiles. Forty-five endpoints were registered. The risk ratio for the end point with an increase in GRS by 1 SD, taking into account cofactors, exceeded statistical significance for the 9 analyzed GRS and was in the range of 1,36-1,54. Conclusion. For the first time in Russia, 16 CAD GRSs, previously developed on European samples, was validated. The results were reproduced only for a few of the studied CAD SGRs.
DOI: 10.15829/1728-8800-2023-3846
2024
Genetic aspects of decreased low-density lipoprotein cholesterol values
Aim . To study genetic causes of decreased low-density lipoprotein cholesterol (LDL-C) in Russian patients. Material and methods . The study included the following Epidemiology of Cardiovascular Diseases and their Risk Factors in Regions of Russian Federation (ESSE-RF) participants: individuals with LDL-C&lt;5th percentile, taking into account sex and age (n=52), who underwent targeted sequencing of protein-coding regions of 6 genes (APOB, PCSK9, MTTP, ANGPTL3, SAR1B, APOC3) and determination of the genetic risk score (GRS) for hypercholesterolemia; and a representative sample of the Ivanovo region population (ESSEIvanovo, n=1667), for which only GRS was determined. Genetic testing was performed using next generation sequencing. Results. In 10 (19,2%) of 52 participants with decreased LDL-C levels, the following rare variants potentially associated with hypocholesterolemia were identified: 8 — leading to a premature termination codon in the APOB gene, 1 — leading to a premature termination codon in the APOC3 gene and 1 missense variant in the PCSK9 gene. Of the 10 identified variants, 6 are described by us for the first time. GRS in the LDL-C group (0,27±0,25) was significantly lower than in the ESSE-Ivanovo population sample (0,43±0,27) (p=4,7×10-06). Conclusion . Genetic reasons explain decreased LDL-C levels (&lt;5th percentile) in 32,7% of patients, of which only monogenic variants were identified in 13,5%, a combination of monogenic and polygenic hypocholesterolemia — in 5,7%, and polygenic hypocholesterolemia — in 13,5%.
DOI: 10.1002/prot.21487
2007
Cited 24 times
A novel approach to local similarity of protein binding sites substantially improves computational drug design results
We present a novel notion of binding site local similarity based on the analysis of complete protein environments of ligand fragments. Comparison of a query protein binding site (target) against the 3D structure of another protein (analog) in complex with a ligand enables ligand fragments from the analog complex to be transferred to positions in the target site, so that the complete protein environments of the fragment and its image are similar. The revealed environments are similarity regions and the fragments transferred to the target site are considered as binding patterns. The set of such binding patterns derived from a database of analog complexes forms a cloud-like structure (fragment cloud), which is a powerful tool for computational drug design. It has been shown on independent test sets that the combined use of a traditional energy-based score together with the cloud-based score responsible for the quality of embedding of a ligand into the fragment cloud improves the self-docking and screening results dramatically. The usage of a fragment cloud as a source of positioned molecular fragments fitting the binding protein environment has been validated by reproduction of experimental ligand optimization results.
DOI: 10.3390/genes13020309
2022
Cited 5 times
A Case of Severe Left-Ventricular Noncompaction Associated with Splicing Altering Variant in the FHOD3 Gene
Left ventricular noncompaction (LVNC) is a highly heterogeneous primary disorder of the myocardium. Its clinical features and genetic spectrum strongly overlap with other types of primary cardiomyopathies, in particular, hypertrophic cardiomyopathy. Study and the accumulation of genotype-phenotype correlations are the way to improve the precision of our diagnostics. We present a familial case of LVNC with arrhythmic and thrombotic complications, myocardial fibrosis and heart failure, cosegregating with the splicing variant in the FHOD3 gene. This is the first description of FHOD3-dependent LVNC to our knowledge. We also revise the assumed mechanism of pathogenesis in the case of FHOD3 splicing alterations.
DOI: 10.3390/jpm12071132
2022
Cited 5 times
Identification of Pathogenic Variant Burden and Selection of Optimal Diagnostic Method Is a Way to Improve Carrier Screening for Autosomal Recessive Diseases
Cystic fibrosis, phenylketonuria, alpha-1 antitrypsin deficiency, and sensorineural hearing loss are among the most common autosomal recessive diseases, which require carrier screening. The evaluation of population allele frequencies (AF) of pathogenic variants in genes associated with these conditions and the choice of the best genotyping method are the necessary steps toward development and practical implementation of carrier-screening programs. We performed custom panel genotyping of 3821 unrelated participants from two Russian population representative samples and three patient groups using real-time polymerase chain reaction (PCR) and next generation sequencing (NGS). The custom panel included 115 known pathogenic variants in the CFTR, PAH, SERPINA1, and GJB2 genes. Overall, 38 variants were detected. The comparison of genotyping platforms revealed the following advantages of real-time PCR: relatively low cost, simple genotyping data analysis, and easier detection of large indels, while NGS showed better accuracy of variants identification and capability for detection of additional pathogenic variants in adjacent regions. A total of 23 variants had significant differences in estimated AF comparing with non-Finnish Europeans from gnomAD. This study provides new AF data for variants associated with the studied disorders and the comparison of genotyping methods for carrier screening.
DOI: 10.1186/1471-2164-8-24
2007
Cited 17 times
Mouse SNP Miner: an annotated database of mouse functional single nucleotide polymorphisms
The mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has been much more difficult. The recent availability of genomic sequence from several mouse inbred strains (including C57BL/6J, 129X1/SvJ, 129S1/SvImJ, A/J, and DBA/2J) has made it possible to catalog DNA sequence differences within a quantitative trait locus derived from crosses between these strains. However, even for well-defined quantitative trait loci (<10 Mb) the identification of candidate functional DNA sequence changes remains challenging due to the high density of sequence variation between strains. To help identify functional DNA sequence variations within quantitative trait loci we have used the Ensembl annotated genome sequence to compile a database of mouse single nucleotide polymorphisms (SNPs) that are predicted to cause missense, nonsense, frameshift, or splice site mutations (available at http://bioinfo.embl.it/SnpApplet/ ). For missense mutations we have used the PolyPhen and PANTHER algorithms to predict whether amino acid changes are likely to disrupt protein function. We have developed a database of mouse SNPs predicted to cause missense, nonsense, frameshift, and splice-site mutations. Our analysis revealed that 20% and 14% of missense SNPs are likely to be deleterious according to PolyPhen and PANTHER, respectively, and 6% are considered deleterious by both algorithms. The database also provides gene expression and functional annotations from the Symatlas, Gene Ontology, and OMIM databases to further assess candidate phenotype-causing mutations. To demonstrate its utility, we show that Mouse SNP Miner successfully finds a previously identified candidate SNP in the taste receptor, Tas1r3, that underlies sucrose preference in the C57BL/6J strain. We also use Mouse SNP Miner to derive a list of candidate phenotype-causing mutations within a previously uncharacterized QTL for response to morphine in the 129/Sv strain.
DOI: 10.3324/haematol.2020.249193
2020
Cited 8 times
MYB bi-allelic targeting abrogates primitive clonogenic progenitors while the emergence of primitive blood cells is not affected
MYB is a key regulator of definitive hematopoiesis and it is dispensable for the development of primitive hematopoietic cells in vertebrates. To delineate definitive versus primitive hematopoiesis during differentiation of human embryonic stem cells, we have introduced reporters into the MYB locus and inactivated the gene by bi-allelic targeting. To recapitulate the early developmental events more adequately, the mutant and wild type human embryonic stem cell lines were differentiated in defined culture conditions without the addition of hematopoietic cytokines. The differentiation of the reporter cell lines demonstrated that MYB is specifically expressed throughout emerging hematopoietic cell populations. Here we show that the disruption of the MYB gene leads to severe defects in the development and proliferation of primitive hematopoietic progenitors while the emergence of primitive blood cells is not affected. We also provide evidence that MYB is essential for neutrophil and T cell development and the upregulation of innate immunity genes during hematopoietic differentiation. Our results suggest that the endothelial origin of primitive blood cells is direct and does not include the intermediate step of primitive hematopoietic progenitors.
DOI: 10.1007/s12551-022-01005-w
2022
Cited 4 times
Intragenic compensation through the lens of deep mutational scanning
DOI: 10.20944/preprints202306.0140.v1
2023
Applicability of Diagnostic Criteria and Prevalence of Familial Dysbetalipoproteinemia in Russia
Familial dysbetalipoproteinemia (FD) is a highly atherogenic genetically-based lipid disorder with the underestimated actual prevalence. In the recent years, several biochemical algorithms have been developed to diagnose FD using available laboratory tests. However, there is not enough data on their use in real-world clinical implementation. We studied the applicability of the most accessible biochemical algorithms to diagnose FD in clinical practice. We also investi-gated the prevalence of FD in one of the European regions of Russia based on a population sample. In this study there was detected a high prevalence of FD: 1 in 151. We demonstrated that the diagnostic algorithms of FD including a diagnostic apoB levels require correction, taking into account the characteristics of the distribution of apoB levels in the population. At the same time a triglycerides cutoff ≥1.5 mmol/L may be a useful tool in identifying subjects with FD. We also analyzed the presence and pathogenicity of APOE variants associated with the autosomal dominant FD in a large research sample.
DOI: 10.3390/ijms241713159
2023
Applicability of Diagnostic Criteria and High Prevalence of Familial Dysbetalipoproteinemia in Russia: A Pilot Study
Familial dysbetalipoproteinemia (FD) is a highly atherogenic genetically based lipid disorder with an underestimated actual prevalence. In recent years, several biochemical algorithms have been developed to diagnose FD using available laboratory tests. The practical applicability of FD diagnostic criteria and the prevalence of FD in Russia have not been previously assessed. We demonstrated that the diagnostic algorithms of FD, including the diagnostic apoB levels, require correction, taking into account the distribution of apoB levels in the population. At the same time, a triglycerides cutoff ≥ 1.5 mmol/L may be a useful tool in identifying subjects with FD. In this study, a high prevalence of FD was detected: 0.67% (one in 150) based on the ε2ε2 haplotype and triglycerides levels ≥ 1.5 mmol/L. We also analyzed the presence and pathogenicity of APOE variants associated with autosomal dominant FD in a large research sample.
DOI: 10.1101/784132
2019
Cited 5 times
The burden of deleterious variants in a non-human primate biomedical model
ABSTRACT Genome sequencing studies of nonhuman primate (NHP) pedigree and population samples are discovering variants on a large and rapidly growing scale. These studies are increasing the utility of several NHP species as model systems for human disease. In particular, by identifying homozygous protein truncating variants (hPTVs) in genes hypothesized to play a role in causing human diseases, it may be possible to elucidate mechanisms for the phenotypic impact of such variants through investigations that are infeasible in humans. The Caribbean vervet ( Chlorocebus aethiops sabaeus ) is uniquely valuable for this purpose, as the dramatic expansion of its population following severe bottlenecks has enabled PTVs that passed through the bottleneck to attain a relatively high frequency. Using whole genome sequence (WGS) data from 719 monkeys of the Vervet Research Colony (VRC) extended pedigree, we found 2,802 protein-truncating alleles in 1,747 protein-coding genes present in homozygous state in at least one monkey. Polymorphic sites for 923 SNV hPTVs were also observed in natural Caribbean populations from which the VRC descends. The vervet genome browser (VGB) includes information on these PTVs, together with a catalog of phenotypes and biological samples available for monkeys who carry them. We describe initial explorations of the possible impact of vervet PTVs on early infant mortality.
DOI: 10.1101/340158
2018
Cited 4 times
Understanding the Hidden Complexity of Latin American Population Isolates
Abstract Most population isolates examined to date were founded from a single ancestral population. Consequently, there is limited knowledge about the demographic history of admixed population isolates. Here we investigate genomic diversity of recently admixed population isolates from Costa Rica and Colombia and compare their diversity to a benchmark population isolate, the Finnish. These Latin American isolates originated during the 16 th century from admixture between a few hundred European males and Amerindian females, with a limited contribution from African founders. We examine whole genome sequence data from 449 individuals, ascertained as families to build mutigenerational pedigrees, with a mean sequencing depth of coverage of approximately 24X. We find that Latin American isolates have increased genetic diversity relative to the Finnish. However, there is an increase in the amount of identity by descent (IBD) segments in the Latin American isolates relative to the Finnish. The increase in IBD segments is likely a consequence of a very recent and severe population bottleneck during the founding of the admixed population isolates. Furthermore, the proportion of the genome that falls within a long run of homozygosity (ROH) in Costa Rican and Colombian individuals was significantly greater than that in the Finnish, suggesting more recent consanguinity in the Latin American isolates relative to that seen in the Finnish. Lastly, we found that recent consanguinity increased the number of deleterious variants found in the homozygous state, which is relevant if deleterious variants are recessive. Our study suggests there is no single genetic signature of a population isolate.
DOI: 10.1038/s41586-019-1726-x
2019
Cited 4 times
Author Correction: Exome sequencing of Finnish isolates enhances rare-variant association power
An Amendment to this paper has been published and can be accessed via a link at the top of the paper.
DOI: 10.1101/062471
2016
Cited 3 times
Rare copy number variants in <i>NRXN1</i> and <i>CNTN6</i> increase risk for Tourette syndrome
Tourette syndrome (TS) is highly heritable, although identification of its underlying genetic cause(s) has remained elusive. We examined a European ancestry sample composed of 2,435 TS cases and 4,100 controls for copy-number variants (CNVs) using SNP microarrays and identified two genome-wide significant loci that confer a substantial increase in risk for TS ( NRXN1 , OR=20.3, 95%CI [2.6-156.2], p=6.0 × 10 −6 ; CNTN6 , OR=10.1, 95% CI [2.3-45.4], p=3.7 × 10 −5 ). Approximately 1% of TS cases carried one of these CNVs, indicating that rare structural variation contributes significantly to the genetic architecture of TS.
DOI: 10.1134/s0026893309020095
2009
Cited 4 times
Computational analysis of human genome polymorphism
DOI: 10.1093/bioinformatics/17.11.1065
2001
Cited 7 times
Segmentation of long genomic sequences into domains with homogeneous composition with <i>BASIO</i> software
We present a software system BASIO that allows one to segment a sequence into regions with homogeneous nucleotide composition at a desired length scale. The system can work with arbitrary alphabet and therefore can be applied to various (e.g. protein) sequences. Several sequences of complete genomes of eukaryotes are used to demonstrate the efficiency of the software.The BASIO suite is available for non-commercial users free of charge as a set of executables and accompanying segmentation scenarios from http://www.imb.ac.ru/compbio/basio. To obtain the source code, contact the authors.
DOI: 10.1016/j.atherosclerosis.2023.06.399
2023
Carotid and femoral plaques in patients with familial dysbetalipoproteinemia, familial hypercholesterolemia, polygenic and severe hypercholesterolemia
Background and Aims: To compare severity of carotid and femoral atherosclerosis in patients with severe hypercholesterolemia with different etiology. Methods: Patients with median age 54 (47-61) with familial dysbetalipoproteinemia (FD) (APOE ε2ε2 haplotype and TG ≥1.5 mmol/L; n=26), familial hypercholesterolemia (FH) (pathogenic or probably pathogenic variants in LDLR, APOB or PCSK9 and DLCN criteria ≥9 points; n=61), polygenic hypercholesterolemia (LDL-C polygenic score >80th percentile and LDL-C >4.9 mmol/L, without tendons xanthomas; n=49) or severe hypercholesterolemia (LDL-C polygenic score <50th percentile and LDL-C >4.9 mmol/L, without tendons xanthomas; n=41). Logistic regression (adjusted for age, sex, BMI, arterial hypertension, diabetes, smoking and statin treatment duration) and Holm–Bonferroni method were used. Carotid and femoral arteries were analyzed for plaque number (total number of plaques) using Samsung Medison MySono U6 and Philips iU22. Results: Patients with FH had a higher carotid plaque number (MED 4 (1-5)) compared with patients with FD (MED 2 (1-3)), polygenic (MED 2 (1-4)) or severe (MED 2 (1-3)) hypercholesterolemia (p=0,018). The patients with FD, polygenic or severe hypercholesterolemia were comparable in carotid plaque number. Femoral plaque number in patients with FH (MED 3 (0-5)), FD (MED 2 (0-3.8)), polygenic (MED 1 (0-3)) or severe (MED 1 (0-2)) hypercholesterolemia did not differ. Conclusions: When the severity of carotid and femoral atherosclerosis in patients with FH, FD, polygenic or severe hypercholesterolemia was compared, differences were obtained only for patients with FH who had a significantly higher number of carotid plaques.
DOI: 10.1016/j.atherosclerosis.2023.06.355
2023
The prevalence of familial dysbetalipoproteinemia in one of the European regions of the Russian Federation
Background and Aims: Familial dysbetalipoproteinemia (FD) is a highly atherogenic genetically based lipid disorder. The prevalence of FD in Russia is unknown. The aim was to investigate the prevalence of FD in one of the European regions of Russia. Methods: 25-64 y.o. subjects (n=1858) were from the population-based cohort of the ESSE-RF study, led in the Ivanovo region. Genetic data and lipid profiles were available. The FD is defined by combined both APOE ε2ε2 haplotype and available for real clinical practice biochemical criteria. For this reason, the applicability of previously developed biochemical FD criteria (apoB algorithm: apoB <1.2 g/L, TG ≥1.5 mmol/L, TG/apoB <10.0 mmol/g, TC/apoB ≥6.2 mmol/g; non-HDL-C/apoB ≥3.69 mmol/g) was determined in the study cohort (for subjects without lipid-lowering therapy, n=1748). Statistical analyses were done using R 4.1. Results: The apoB algorithm analysis showed low specificity of apoB level (11.6%) and TG/apoB ratio (0.2%) for identifying subjects without the ε2ε2 haplotype. TG level ≥1.5 mmol/l had sufficient sensitivity (78.6%) and specificity (65.4%) to identify subjects with ε2ε2 haplotype. The non-HDL-C/apoB ratio had a low specificity (1.1%) for identifying subjects without the ε2ε2 haplotype in the study population (Table 1). The prevalence of ε2ε2 haplotype was 0.8% (one in 124) (95% CI: 0.45%-1.32%). When TG ≥1.5 mmol/L criterion was added to ε2ε2 carriage, the FD was identified in 12 subjects. Thus, the prevalence of FD was 0.6% (1 in 155) (95% CI: 0.33–1.12). Conclusions: A high prevalence of FD was detected in one of the European regions of Russia.
DOI: 10.17116/profmed20232610136
2023
Comparison of three approaches to autosomal recessive diseases carrier screening
DOI: 10.15829/1728-8800-2023-3755
2023
Validation of genetic risk scores for obesity on a sample of the population of Russian regions
Aim. To validate and evaluate the accuracy of 15 genetic risk scores (GRSs) for obesity, created in populations of European origin in the sample of two European Russia regions. Material and methods. Genetic testing has been performed using next generation sequencing on a sample from the Russian population (n=1179). The study included 15 GRS associated with body mass index (BMI) or waist-to-hip ratio adjusted for BMI (WHRadjBMI). Results. The predictive power of 8 out of 9 GRSs for obesity based on BMI remains the same for the Russian population. The predictive power of 6 GRSs for obesity based on WHRadjBMI is lower in the Russian population than in the reference sample. GRS reproducibility increases with the size of initial samples and number of variants included in the GRS increase. The use of GRSs for obesity based on BMI in the Russian population created on European populations is justified. Conclusion. For the first time in Russia, 15 obesity GRSs developed in European populations have been validated. The data obtained on the effectiveness of the considered GRS can be used in the future to improve the obesity prediction and prevention in Russia.
DOI: 10.15829/1728-8800-20233746
2023
Validation of genetic risk scores for type 2 diabetes on a Russian population sample from the biobank of the National Medical Research Center for Therapy and Preventive Medicine
Aim. To validate and evaluate the accuracy of 14 genetic risk scores (GRSs) for type 2 diabetes (T2D), created earlier in other countries, using a Russian population sample from the biobank of the National Medical Research Center for Therapy and Preventive Medicine. Material and methods. For genetic analysis, next generation sequencing data was used on a sample from the Russian population (n=1165) based on the biobank collection. The study included 14 GRSs associated with T2D. Results. The study demonstrated that the predictive power of 12 out of 14 GRSs for T2D was replicated in the Russian population. As quality metrics, we used the area under the ROC curve, which for models including only GRS varied from 54,49 to 59,46%, and for models including GRS, sex and age — from 77,56 to 78,75%. Conclusion. For the first time in Russia, a study of 14 T2D GRSs developed on other populations was conducted. Twelve GRSs have been validated and can be used in the future to improve risk prediction and prevention of T2D in Russia.
DOI: 10.17116/profmed20232612173
2023
Evaluation of carrier frequency of pathogenic variants in genes associated with the development of autosomal and X-linked recessive diseases
DOI: 10.15829/ropniz-d95-2023
2023
PRIORITY AREAS OF SCIENTIFIC RESEARCH IN THE INTERESTS OF IMPROVING THE PROVISION OF MEDICAL CARE IN THE "THERAPY" PROFILE, INCLUDING "PREVENTIVE MEDICINE"
DOI: 10.1101/363267
2018
Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates
Abstract Current evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To evaluate this hypothesis, we conducted genetic analyses in 26 Colombian (CO) and Costa Rican (CR) pedigrees ascertained for BP1, the most severe and heritable form of BP. In these pedigrees, we performed microarray SNP genotyping of 856 individuals and high-coverage whole-genome sequencing of 454 individuals. Compared to their unaffected relatives, BP1 individuals had higher polygenic risk scores estimated from SNPs associated with BP discovered in independent genome-wide association studies, and also displayed a higher burden of rare deleterious single nucleotide variants (SNVs) and rare copy number variants (CNVs) in genes likely to be relevant to BP1. Parametric and non-parametric linkage analyses identified 15 BP1 linkage peaks, encompassing about 100 genes, although we observed no significant segregation pattern for any particular rare SNVs and CNVs. These results suggest that even in extended pedigrees, genetic risk for BP appears to derive mainly from small to moderate effect rare and common variants.
DOI: 10.2139/ssrn.3406382
2019
MYB is an Essential Regulator of Primitive Human Hematopoiesis in Pluripotent Stem Cell Differentiation Cultures
MYB is a key regulator of definitive hematopoiesis that plays a critical role in the maintenance and multilineage differentiation of hematopoietic stem cells (HSCs). In vertebrate developmental models, MYB is thought to be dispensable for primitive hematopoiesis. To explore the role of MYB in human hematopoietic development, we have subjected human pluripotent stem cells (hPSCs) to mono- and bi-allelic gene targeting followed by hematopoietic differentiation in defined culture conditions. Here we show that MYB plays a central role in the development of human primitive blood cells. MYB expression was hematopoietic-specific and its induction coincided with emergence of the earliest primitive blood cells. Bi-allelic inactivation of MYB most severely affected the primitive erythroid progenitors of greater proliferative capacity and multilineage hematopoietic progenitors. The initial phase of the hematopoietic differentiation was not affected by the bi-allelic gene deficiency, but maturation of the primitive myeloid cells was found to be MYB-depended. Rescuing MYB expression in MYB-null cells shows that the gene is required for both development and proliferation of primitive clonogenic progenitors. In addition, our findings suggest that human primitive hematopoiesis emerge in several developmental cohorts.
DOI: 10.21203/rs.3.rs-27287/v1
2020
ACE2 and TMPRSS2 variation in savanna monkeys (Chlorocebus spp.): Potential risk for zoonotic/anthroponotic transmission of SARS-CoV-2 and a potential model for functional studies
Abstract The COVID-19 pandemic, caused by the coronavirus SARS-CoV-2, has devastated health infrastructure around the world. Both ACE2 (an entry receptor) and TMPRSS2 (used by the virus for spike protein priming) are key proteins to SARS-CoV-2 cell entry, enabling progression to COVID-19 in humans. Comparative genomic research into critical ACE2 binding sites, associated with the spike receptor binding domain, has suggested that Old World primates may also be susceptible to disease from SARS-CoV-2 infection. Savanna monkeys ( Chlorocebus spp.) are a widespread non-human primate with well-established potential as a bi-directional zoonotic/anthroponotic agent due to high levels of human interaction throughout their range in sub-Saharan Africa and the Caribbean. To characterize potential functional variation in savanna monkey ACE2 and TMPRSS2 , we inspected recently published genomic data from 245 savanna monkeys, including 163 wild monkeys from Africa and the Caribbean and 82 captive monkeys from the Vervet Research Colony (VRC). We found several missense variants. One missense variant in ACE2 (X:14,077,550; Asp30Gly), common in Ch. sabaeus , causes a change in amino acid residue that has been inferred to reduce binding efficiency of SARS-CoV-2, suggesting potentially reduced susceptibility. The remaining populations appear as susceptible as humans, based on these criteria for receptor usage. All missense variants observed in wild Ch. sabaeus populations are also present in the VRC, along with two splice acceptor variants (at X:14,065,076) not observed in the wild sample that are potentially disruptive to ACE2 function. The presence of these variants in the VRC suggests a promising model for SARS-CoV-2 infection and vaccine and therapy development. In keeping with a One Health approach, characterizing actual susceptibility and potential for bi-directional zoonotic/anthroponotic transfer in savanna monkey populations may be an important consideration for controlling COVID-19 epidemics in communities with frequent human/non-human primate interactions that, in many cases, may have limited health infrastructure.
DOI: 10.1101/095968
2016
CpG traffic lights are markers of regulatory regions in humans
Abstract DNA methylation is involved in regulation of gene expression. Although modern methods profile DNA methylation at single CpG sites, methylation levels are usually averaged over genomic regions in the downstream analyses. In this study we demonstrate that single CpG methylation can serve as a more accurate predictor of gene expression compared to average promoter / gene body methylation. CpG positions with significant correlation between methylation and expression of a gene nearby (named CpG traffic lights) are evolutionary conserved and enriched for exact TSS positions and active enhancers. Among all promoter types, CpG traffic lights are especially enriched in poised promoters. Genes that harbor CpG traffic lights are associated with development and signal transduction. Methylation levels of individual CpG traffic lights vary between cell types dramatically with the increased frequency of intermediate methylation levels, indicating cell population heterogeneity in CpG methylation levels. Being in line with the concept of the inherited stochastic epigenetic variation, methylation of such CpG positions might contribute to transcriptional regulation. Alternatively, one can hypothesize that traffic lights are markers of absent gene expression resulting from inactivation of their regulatory elements. The CpG traffic lights provide a promising insight into mechanisms of enhancer activity and gene regulation linking methylation of single CpG to expression.
DOI: 10.1007/3-540-45727-5_6
2001
Cited 4 times
Bayesian Approach to DNA Segmentation into Regions with Different Average Nucleotide Composition
We present a new method of segmentation of nucleotide sequences into regions with different average composition. The sequence is modelled as a series of segments; within each segment the sequence is considered as a random sequence of independent and identically distributed variables. The partition algorithm includes two stages. In the first stage the optimal partition is found, which maximises the overall product of marginal likelihoods calculated for each segment. To prevent segmentation into short segments, the border insertion penalty may be introduced. In the next stage segments with close compositions are merged. Filtration is performed with the help of partition function calculated for all possible subsets of boundaries that belong to the optimal partition. The long sequences can be segmented by dividing sequences and segmenting those parts separately. The contextual effects of repeats, genes and other genomic elements are readily visualised.
DOI: 10.15829/1560-4071-2022-5232
2022
ANGPTL3, ANGPTL4, APOA5, APOB, APOC2, APOC3, LDLR, PCSK9, LPL gene variants and coronary artery disease risk
Aim . To study the contribution of rare and low-frequency variants of ANGPTL3, ANGPTL4, APOA5, APOB, APOC2, APOC3, LDLR, PCSK9, LPL genes in assessing the risk of coronary artery disease (CAD) in a cohort of Russian patients with various cardiovascular risks. Material and methods . The study was conducted on a sample of participants in cohort and epidemiological studies (n=2405). Targeted enrichment of coding sequences and exon-intron regions of nine genes (ANGPTL3, ANGPTL4, APOA5, APOB, APOC2, APOC3, LDLR, PCSK9, LPL) was performed. Genetic diagnostics was carried out by next generation sequencing. Results . CAD was confirmed in 267 patients (11%). After genetic diagnosis, all patients were divided into three following groups: individuals with previously described genetic variants associated with elevated levels of low-density lipoprotein cholesterol (LDL-C) and/or triglycerides (TGs); individuals with genetic variants associated with reduced levels of LDL-C and/or TGs; individuals without genetic variants associated with LDL-C and/or TG levels, or with two or more variants with opposite effects on LDL-C and/or TG levels. Kaplan-Meier method revealed that the groups significantly differ in cumulative risk of CAD (p&lt;0,001 for the log-rank test), the maximum risk was in group 1, and the minimum risk in group 2. When conducting the Cox regression, we found that in persons from group 1, the hazard ratio (HR) for CAD is 2,63 times higher (HR =2,63, 95% confidence interval (CI), 1,6-4,34; p&gt;&lt;0,001), and in persons from group 2 lower by 1,88 times (HR =0,53, 95% CI, 0,3-0,98; p=0,042) compared with persons from group 3, adjusted for other CAD risk factors: sex, age, smoking, LDL-C and hypertension. Conclusion. Genetic testing in young patients makes it possible to identify individuals with an increased genetic risk of CAD and to focus preventive and therapeutic measures primarily for this category of patients. Keywords: coronary artery disease, cardiovascular diseases, low-density lipoprotein cholesterol, genetic testing. Relationships and Activities: none. 1National Medical Research Center for Therapy and Preventive Medicine, Moscow; 2Pirogov Russian National Research Medical University, Moscow; 3E. I. Chazov National Medical Research Center of Cardiology, Moscow; 4Lomonosov Moscow State University, Moscow; 5Moscow Institute of Physics and Technology, Dolgoprudny, Russia.&gt;&lt;0,001 for the log-rank test), the maximum risk was in group 1, and the minimum risk in group 2. When conducting the Cox regression, we found that in persons from group 1, the hazard ratio (HR) for CAD is 2,63 times higher (HR =2,63, 95% confidence interval (CI), 1,6-4,34; p&lt;0,001), and in persons from group 2 lower by 1,88 times (HR =0,53, 95% CI, 0,3-0,98; p=0,042) compared with persons from group 3, adjusted for other CAD risk factors: sex, age, smoking, LDL-C and hypertension. Conclusion . Genetic testing in young patients makes it possible to identify individuals with an increased genetic risk of CAD and to focus preventive and therapeutic measures primarily for this category of patients.
DOI: 10.1080/10629360108033246
2001
Cited 3 times
Molecular Modelling of Disease-Causing Single-Nucleotide Polymorphisms in Collagen
The purpose of the work was to investigate at the molecular structural and energy levels the consequence of amino acid substitutions in collagen that cause systemic diseases. The data have been systematized on defects in human collagen III, and the patterns of single-nucleotide polymorphisms collected. Then molecular mechanics calculations were performed for native and mutant collagen molecule fragments. The observed energy components and structural alterations that accompany particular amino acid substitutions were used to propose an interpretation of negative consequences in terms of stability and hydration of the macromolecule.
DOI: 10.1101/464255
2018
Exome sequencing identifies high-impact trait-associated alleles enriched in Finns
ABSTRACT As yet undiscovered rare variants are hypothesized to substantially influence an individual’s risk for common diseases and traits, but sequencing studies aiming to identify such variants have generally been underpowered. In isolated populations that have expanded rapidly after a population bottleneck, deleterious alleles that passed through the bottleneck may be maintained at much higher frequencies than in other populations. In an exome sequencing study of nearly 20,000 cohort participants from northern and eastern Finnish populations that exemplify this phenomenon, most novel trait-associated deleterious variants are seen only in Finland or display frequencies more than 20 times higher than in other European populations. These enriched alleles underlie 34 novel associations with 21 disease-related quantitative traits and demonstrate a geographical clustering equivalent to that of Mendelian disease mutations characteristic of the Finnish population. Sequencing studies in populations without this unique history would require hundreds of thousands to millions of participants for comparable power for these variants.
DOI: 10.1101/092874
2016
Genetic variation and gene expression across multiple tissues and developmental stages in a non-human primate
By analyzing multi-tissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalogue of expression quantitative trait loci (eQTLs) in a non-human primate model. This catalogue contains more genome-wide significant eQTLs, per sample, than comparable human resources, and reveals sex and age-related expression patterns. Findings include a master regulatory locus that likely plays a role in immune function, and a locus regulating hippocampal long non-coding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.
DOI: 10.1002/9780470015902.a0021735
2009
Positive Selection and Alternative Splicing in Humans
Abstract Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. The mode of the selection acting in constitutive, major alternative and minor alternative regions of human genes is different. Whereas constitutive and major alternative regions tend to evolve under negative (stabilizing) selection, alternatively spliced exons from minor isoforms experience lower selective pressure at the amino acid level accompanied by weak selection against synonymous sequence variation. The McDonald–Kreitman test uses the nucleotide variation for a gene or a set of genes between and within species to detect the positive Darwinian selection in the presence of negative selection. The results of the test suggest that alternatively spliced exons are also subject to positive selection, with up to 27% of amino acids fixed by positive selection. Key concepts Alternative splicing is an important mechanism of generating protein diversity and accelerated genome evolution. Alternatively spliced regions are often evolutionarily young. There is a difference in the selection mode in constitutive, major alternative, and minor alternative regions of human genes. Constitutive and major alternative regions evolve under negative (stabilizing) selection. Up to 27% of positions in minor alternative regions may be experiencing positive selection.
DOI: 10.3389/fcvm.2022.982607
2022
Phenotypic vs. genetic cascade screening for familial hypercholesterolemia: A case report
One of the most common autosomal dominant disorders is familial hypercholesterolemia (FH), causing premature atherosclerotic cardiovascular diseases and a high risk of death due to lifelong exposure to elevated low-density lipoprotein cholesterol (LDL-C) levels. FH has a proven arsenal of treatments and the opportunity for genetic diagnosis. Despite this, FH remains largely underdiagnosed worldwide. Cascade screening is a cost-effective method for the identification of new patients with FH and the prevention of cardiovascular diseases. It is usually based only on clinical data. We describe a 48-year-old index patient with a very high LDL-C level without controlled guidelines-based medication, premature atherosclerosis, and a rare variant in the low-density lipoprotein receptor (LDLR) gene. Phenotypic cascade screening identified three additional FH relatives, namely the proband's daughter, and two young grandsons. The genetic screening made it possible to rule out FH in the proband's younger grandson. This clinical case demonstrates that genetic cascade screening is the most effective way of identifying new FH cases. We also first described in detail the phenotype of patients with a likely pathogenic variant LDLR-p.K223_D227dup.
DOI: 10.1093/gigascience/giac086
2022
d-StructMAn: Containerized structural annotation on the scale from genetic variants to whole proteomes
Abstract Background Structural annotation of genetic variants in the context of intermolecular interactions and protein stability can shed light onto mechanisms of disease-related phenotypes. Three-dimensional structures of related proteins in complexes with other proteins, nucleic acids, or ligands enrich such functional interpretation, since intermolecular interactions are well conserved in evolution. Results We present d-StructMAn, a novel computational method that enables structural annotation of local genetic variants, such as single-nucleotide variants and in-frame indels, and implements it in a highly efficient and user-friendly tool provided as a Docker container. Using d-StructMAn, we annotated several very large sets of human genetic variants, including all variants from ClinVar and all amino acid positions in the human proteome. We were able to provide annotation for more than 46% of positions in the human proteome representing over 60% proteins. Conclusions d-StructMAn is the first of its kind and a highly efficient tool for structural annotation of protein-coding genetic variation in the context of observed and potential intermolecular interactions. d-StructMAn is readily applicable to proteome-scale datasets and can be an instrumental building machine-learning tool for predicting genotype-to-phenotype relationships.
DOI: 10.1038/s41588-018-0124-x
2018
Publisher Correction: Ancient hybridization and strong adaptation to viruses across African vervet monkey populations
In the version of this article published, in the Online Methods eight citations to supplementary material refer to the wrong supplementary items. See the correction notice for full details.
2019
Exome sequencing of Finnish isolates enhances rare-variant association power
DOI: 10.6084/m9.figshare.7664966.v1
2019
Additional file 1 of CpG traffic lights are markers of regulatory regions in human genome
DOI: 10.7490/f1000research.1117212.1
2019
The importance of being unbiased: why protein structure and training setup are important for predicting novel pathogenic genetic variants