ϟ

Chunyu Geng

Here are all the papers by Chunyu Geng that you can download and read on OA.mg.
Chunyu Geng’s last known institution is . Download Chunyu Geng PDFs here.

Claim this Profile →
DOI: 10.1038/ng.919
2011
Cited 1,787 times
The genome of the mesopolyploid crop species Brassica rapa
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
DOI: 10.1038/ncomms2860
2013
Cited 223 times
Draft genome sequence of the Tibetan antelope
The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.
DOI: 10.1093/gigascience/gix024
2017
Cited 210 times
A reference human genome dataset of the BGISEQ-500 sequencer
Background: BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Findings: Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. Conclusions: We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.
DOI: 10.1186/s13148-016-0287-1
2016
Cited 123 times
cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs
We present the first sequencing data using the combinatorial probe-anchor synthesis (cPAS)-based BGISEQ-500 sequencer. Applying cPAS, we investigated the repertoire of human small non-coding RNAs and compared it to other techniques.Starting with repeated measurements of different specimens including solid tissues (brain and heart) and blood, we generated a median of 30.1 million reads per sample. 24.1 million mapped to the human genome and 23.3 million to the miRBase. Among six technical replicates of brain samples, we observed a median correlation of 0.98. Comparing BGISEQ-500 to HiSeq, we calculated a correlation of 0.75. The comparability to microarrays was similar for both BGISEQ-500 and HiSeq with the first one showing a correlation of 0.58 and the latter one correlation of 0.6. As for a potential bias in the detected expression distribution in blood cells, 98.6% of HiSeq reads versus 93.1% of BGISEQ-500 reads match to the 10 miRNAs with highest read count. After using miRDeep2 and employing stringent selection criteria for predicting new miRNAs, we detected 74 high-likely candidates in the cPAS sequencing reads prevalent in solid tissues and 36 candidates prevalent in blood.While there is apparently no ideal platform for all challenges of miRNome analyses, cPAS shows high technical reproducibility and supplements the hitherto available platforms.
DOI: 10.1093/gigascience/gix049
2017
Cited 123 times
Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing
Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction–amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.
DOI: 10.1038/gim.2017.170
2018
Cited 53 times
Identification of balanced chromosomal rearrangements previously unknown among participants in the 1000 Genomes Project: implications for interpretation of structural variation in genomes and the future of clinical cytogenetics
<h2>Abstract</h2><h3>Purpose</h3> Recent studies demonstrate that whole-genome sequencing enables detection of cryptic rearrangements in apparently balanced chromosomal rearrangements (also known as balanced chromosomal abnormalities, BCAs) previously identified by conventional cytogenetic methods. We aimed to assess our analytical tool for detecting BCAs in the 1000 Genomes Project without knowing which bands were affected. <h3>Methods</h3> The 1000 Genomes Project provides an unprecedented integrated map of structural variants in phenotypically normal subjects, but there is no information on potential inclusion of subjects with apparent BCAs akin to those traditionally detected in diagnostic cytogenetics laboratories. We applied our analytical tool to 1,166 genomes from the 1000 Genomes Project with sufficient physical coverage (8.25-fold). <h3>Results</h3> With this approach, we detected four reciprocal balanced translocations and four inversions, ranging in size from 57.9kb to 13.3Mb, all of which were confirmed by cytogenetic methods and polymerase chain reaction studies. One of these DNAs has a subtle translocation that is not readily identified by chromosome analysis because of the similarity of the banding patterns and size of exchanged segments, and another results in disruption of all transcripts of an OMIM gene. <h3>Conclusion</h3> Our study demonstrates the extension of utilizing low-pass whole-genome sequencing for unbiased detection of BCAs including translocations and inversions previously unknown in the 1000 Genomes Project.
DOI: 10.1371/journal.pone.0190264
2018
Cited 49 times
Germline and somatic variant identification using BGISEQ-500 and HiSeq X Ten whole genome sequencing
Technological innovation and increased affordability have contributed to the widespread adoption of genome sequencing technologies in biomedical research. In particular large cancer research consortia have embraced next generation sequencing, and have used the technology to define the somatic mutation landscape of multiple cancer types. These studies have primarily utilised the Illumina HiSeq platforms. In this study we performed whole genome sequencing of three malignant pleural mesothelioma and matched normal samples using a new platform, the BGISEQ-500, and compared the results obtained with Illumina HiSeq X Ten. Germline and somatic, single nucleotide variants and small insertions or deletions were independently identified from data aligned human genome reference. The BGISEQ-500 and HiSeq X Ten platforms showed high concordance for germline calls with genotypes from SNP arrays (>99%). The germline and somatic single nucleotide variants identified in both sequencing platforms were highly concordant (86% and 72% respectively). These results indicate the potential applicability of the BGISEQ-500 platform for the identification of somatic and germline single nucleotide variants by whole genome sequencing. The BGISEQ-500 datasets described here represent the first publicly-available cancer genome sequencing performed using this platform.
DOI: 10.1016/j.fuel.2022.126928
2023
Cited 7 times
Experimental and theoretical study of the adsorption of mixed low carbon alcohols and acids from Fischer Tropsch synthesis wastewater by activated carbon
To purify and recycle the low-carbon alcohols and acids in wastewater by a green method with high efficiency is of great significance in chemical engineering process, such as indirect coal liquefaction (Fischer Tropsch Synthesis, FTS), which produces a large amount of alcohol and acid-containing wastewater. Adsorption-desorption of low-carbon alcohols and acids in wastewater by activated carbon is a promising method for industrial application. The adsorption capacity of low-carbon alcohols and acids in activated carbons is closely related with their pore and surface properties. However, this was seldom studied due to the complex properties of carbon materials with different pore size, distribution and volume, as well as surface area and hydrophobicity. In this work, four commercial activated carbon materials (AC1 ∼ 4) with similar hydrophobicity and different pore and surface structure were chosen to explore the possible correlation of the adsorption capacity of C1-5 alcohols, C2-4 acids and FTS modeling wastewater. The batch method was used to investigate the adsorption capacity of AC1 ∼ 4 on low-carbon alcohols and acids as well as FTS modeling wastewater; while density functional theory (DFT) calculation was performed to investigate the adsorption energies of low-carbon alcohols and acids in carbon nanotubes with different pore sizes. Both the experimental and theoretical results indicate that the adsorption capacities of methanol, ethanol and acetic acid are strongly dependent on the ultra-micropores (<0.9 nm) due to their small molecular weight; and the adsorption capacity of pentanol is mainly determined by the surface areas due to its relatively high molecular weight and low polarity; while the adsorption capacities of C3-C4 alcohols and acids are well correlated with the microporous volume and surface area. In addition, and the adsorption capacities of acids were also related with the content of surface COOH species. AC2 has the highest adsorption capacity of mixed low-carbon alcohols and acids (196.2 mg/g) in FTS modeling wastewater due to its abundant ultra-micropore and high specific surface area. This work sheds light into the development of adsorbents with high adsorption capacity of low-carbon alcohols and acids, and provides solid experimental and theoretical support in the further purification of FTS wastewater and recovery of valuable chemicals.
DOI: 10.1186/s12864-019-5965-x
2019
Cited 42 times
Impact of sequencing depth and technology on de novo RNA-Seq assembly
RNA-Seq data is inherently nonuniform for different transcripts because of differences in gene expression. This makes it challenging to decide how much data should be generated from each sample. How much should one spend to recover the less expressed transcripts? The sequencing technology used is another consideration, as there are inevitably always biases against certain sequences. To investigate these effects, we first looked at high-depth libraries from a set of well-annotated organisms to ascertain the impact of sequencing depth on de novo assembly. We then looked at libraries sequenced from the Universal Human Reference RNA (UHRR) to compare the performance of Illumina HiSeq and MGI DNBseq™ technologies.On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology.Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.
DOI: 10.3892/ijo.2016.3674
2016
Cited 38 times
Benzo[a]pyrene promotes gastric cancer cell proliferation and metastasis likely through the Aryl hydrocarbon receptor and ERK-dependent induction of MMP9 and c-myc
Gastric cancer (GC) is the fifth most common cancer worldwide and the third leading cause of global cancer-related death. Benzo[a]pyrene (BaP), a Group Ⅰ carcinogen categorized by the IARC, is a cumulative foodborne carcinogen and ubiquitous environmental pollutant with potent carcinogenic properties. However, the function and mechanism of BaP exposure on GC progression remains unclear. We investigated the role of BaP in human GC progression to identify potential mechanism underlining its carcinogenic activity. After exposure to various concentrations of BaP, human GC cells SGC-7901 and MNK-45 showed an increased capability of proliferation, migration and invasion. Further study indicated that BaP promotes the expression of matrix metalloproteinase-9 (MMP9) and c-myc at mRNA and protein level, and activates Aryl hydrocarbon receptor (AhR) and ERK pathway. Moreover, BaP-induced overexpression of MMP9 and c-myc were attenuated by the ERK inhibitor U0126 and AhR inhibitor resveratrol, respectively. These data suggest that BaP promotes proliferation and metastasis of GC cells through upregulation of MMP9 and c-myc expression, and this was likely mediated via the AhR and ERK signaling pathway.
DOI: 10.1371/journal.pone.0097507
2014
Cited 34 times
OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data
Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.
DOI: 10.3389/fonc.2021.669270
2021
Cited 17 times
A Comprehensive RNA Study to Identify circRNA and miRNA Biomarkers for Docetaxel Resistance in Breast Cancer
To investigate the relationship between non-coding RNAs [especially circular RNAs (circRNAs)] and docetaxel resistance in breast cancer, and to find potential predictive biomarkers for taxane-containing therapies, we have performed transcriptome and microRNA (miRNA) sequencing for two established docetaxel-resistant breast cancer (DRBC) cell lines and their docetaxel-sensitive parental cell lines. Our analyses revealed differences between circRNA signatures in the docetaxel-resistant and -sensitive breast cancer cells, and discovered circRNAs generated by multidrug-resistance genes in taxane-resistant cancer cells. In DRBC cells, circABCB1 was identified and validated as a circRNA that is strongly up-regulated, whereas circEPHA3.1 and circEPHA3.2 are strongly down-regulated. Furthermore, we investigated the potential functions of these circRNAs by bioinformatics analysis, and miRNA analysis was performed to uncover potential interactions between circRNAs and miRNAs. Our data showed that circABCB1, circEPHA3.1 and circEPHA3.2 may sponge up eight significantly differentially expressed miRNAs that are associated with chemotherapy and contribute to docetaxel resistance via the PI3K-Akt and AGE-RAGE signaling pathways. We also integrated differential expression data of mRNA, long non-coding RNA, circRNA, and miRNA to gain a global profile of multi-level RNA changes in DRBC cells, and compared them with changes in DNA copy numbers in the same cell lines. We found that Chromosome 7 q21.12-q21.2 was a common region dominated by multi-level RNA overexpression and DNA amplification, indicating that overexpression of the RNA molecules transcribed from this region may result from DNA amplification during stepwise exposure to docetaxel. These findings may help to further our understanding of the mechanisms underlying docetaxel resistance in breast cancer.
DOI: 10.1212/nxg.0000000000000078
2016
Cited 23 times
Homozygous<i>GNAL</i>mutation associated with familial childhood-onset generalized dystonia
Heterozygous loss-of-function mutations in the GNAL gene encoding the α subunit of the heterotrimeric G protein Golf (Gαolf) are known to cause isolated dystonia.1,2 Gαolf is enriched in the striatum where it couples D1 dopamine (D1R) and A2A adenosine (A2AR) receptors to the activation of adenylyl cyclase type 5 (AC5). Mutations in ADCY5 , the gene encoding AC5, are also known to lead to chorea and dystonia.3,4 Previous functional studies of mutated Gαolf variants have revealed deficiencies in activation after D1R stimulation.1,5 Acknowledgment: The authors thank the patients and their family for their participation; without their support, this work would not have been possible.
2024
Clinical Implications of Interleukin-4 and C-reactive Protein in Atopic Dermatitis and Their Changes Before and after Dupilumab Treatment.
To analyze the clinical implications of C-reactive protein (C-reactive protein) and interleukin-4 (IL-4) in atopic dermatitis and their correlations with the therapeutic effect of Dupilumab (DU).Seventy-four cases of atopic dermatitis (intervention group) were admitted to Xingtai Third Hospital between May 2021 and January 2023, and 55 concurrent healthy controls (control group) were selected as research participants. Atopic dermatitis patients were treated with a DU injection of 600 mg for the first time after diagnosis. Peripheral blood IL-4 and C-reactive protein levels before and after treatment in the intervention group and their levels at admission in the control group were comparatively analyzed, and their predictive value for the occurrence, clinical efficacy, and adverse reactions of atopic dermatitis were determined. Additionally, alterations in C-reactive protein and IL-4 levels before and after treatment in the intervention group and their relationship with the Scoring Atopic Dermatitis (SCORAD) index were discussed.The intervention group exhibited higher C-reactive protein and IL-4 levels than the control group. The diagnostic sensitivity and specificity of C-reactive protein + IL-4 detection for atopic dermatitis were 74.32% and 94.55%, respectively (P < .05). The post-treatment C-reactive protein and IL-4 were lower in the intervention group, and the test results were positively correlated with SCORAD before and after treatment (P < .05). In addition, C-reactive protein + IL-4 detection showed excellent predictive effects on the therapeutic efficacy of DU and adverse reactions.IL-4 and C-reactive protein are closely related to atopic dermatitis, which can be used as the evaluation indexes for disease development of atopic dermatitis and therapeutic effects of DU in the future.
DOI: 10.1016/j.virol.2020.11.003
2021
Cited 11 times
Genetic signatures for lineage/sublineage classification of HPV16, 18, 52 and 58 variants
Increasing evidences indicate that high-risk HPV variants are heterogeneous in carcinogenicity and ethnic dispersion. In this work, we identified genetic signatures for convenient determination of lineage/sublineage of HPV16, 18, 52 and 58 variants. Using publicly available genomes, we found that E2 of HPV16, L2 of HPV18, L1 and LCR of HPV52, and L2, LCR and E1 of HPV58 contain the proper genetic signature for lineage/sublineage classification. Sets of hierarchical signature nucleotide positions were further confirmed for high accuracy (>95%) by classifying HPV genomes obtained from Chinese females, which included 117 HPV16 variants, 48 HPV18 variants, 117 HPV52 variants and 89 HPV58 variants. The circulation of HPV variants posing higher cancer risk in Eastern China, such as HPV16 A4 and HPV58 A3, calls for continuous surveillance in this region. The marker genes and signature nucleotide positions may facilitate cost-effective diagnostic detections of HPV variants in clinical settings.
DOI: 10.1002/ctm2.987
2022
Cited 7 times
Polyadenylation ligation‐mediated sequencing (PALM‐Seq) characterizes cell‐free coding and non‐coding RNAs in human biofluids
Cell-free messenger RNA (cf-mRNA) and long non-coding RNA (cf-lncRNA) are becoming increasingly important in liquid biopsy by providing biomarkers for disease prediction, diagnosis and prognosis, but the simultaneous characterization of coding and non-coding RNAs in human biofluids remains challenging.Here, we developed polyadenylation ligation-mediated sequencing (PALM-Seq), an RNA sequencing strategy employing treatment of RNA with T4 polynucleotide kinase to generate cell-free RNA (cfRNA) fragments with 5' phosphate and 3' hydroxyl and RNase H to deplete abundant RNAs, achieving simultaneous quantification and characterization of cfRNAs.Using PALM-Seq, we successfully identified well-known differentially abundant mRNA, lncRNA and microRNA in the blood plasma of pregnant women. We further characterized cfRNAs in blood plasma, saliva, urine, seminal plasma and amniotic fluid and found that the detected numbers of different RNA biotypes varied with body fluids. The profiles of cf-mRNA reflected the function of originated tissues, and immune cells significantly contributed RNA to blood plasma and saliva. Short fragments (<50 nt) of mRNA and lncRNA were major in biofluids, whereas seminal plasma and amniotic fluid tended to retain long RNA. Body fluids showed distinct preferences of pyrimidine at the 3' end and adenine at the 5' end of cf-mRNA and cf-lncRNA, which were correlated with the proportions of short fragments.Together, PALM-Seq enables a simultaneous characterization of cf-mRNA and cf-lncRNA, contributing to elucidating the biology and promoting the application of cfRNAs.
DOI: 10.1016/j.jes.2024.02.031
2024
Effect of surface species on the alcohol-acid adsorption from FTS wastewater on graphene: A structure-capacity study
Fischer-Tropsch synthesis (FTS) wastewater retaining low-carbon alcohols and acids are organic pollutants as a limiting factor for FTS industrialization. In this work, the structure-capacity relationships between alcohol-acid adsorption and surface species on graphene were reported, shedding light into their intricate interactions. The graphene oxide (GO) and reduced graphene oxide (rGO) were synthesized via improved Hummers method with flake graphite (G). The physicochemical properties of samples were characterized via SEM, XRD, XPS, FT-IR, and Raman. The alcohol-acid adsorption behaviors and adsorption quantities on G, GO, and rGO were measured via theoretical and experimental method. It was revealed that the presence of COOH, C=O and CO species on graphene occupy the adsorption sites and increase the interactions of water with graphene, which are unfavorable for alcohol-acid adsorption. The equilibrium adsorption quantities of alcohols and acids grow in pace with carbon number. The monolayer adsorption occurs on graphene was verified via model fitting. rGO has the highest FTS modeling wastewater adsorption quantity (110 mg/g) due to the reduction of oxygen species. These novel findings provide a foundation for the alcohol-acid wastewater treatment, as well as the design and development of high-performance carbon-based adsorbent materials.
DOI: 10.1021/acs.iecr.7b04758
2018
Cited 16 times
A Comparative Study of the Perturbed-Chain Statistical Associating Fluid Theory Equation of State and Activity Coefficient Models in Phase Equilibria Calculations for Mixtures Containing Associating and Polar Components
Vapor–liquid equilibria (VLE), liquid–liquid equilibria (LLE), and vapor–liquid–liquid equilibria (VLLE) for systems involving highly nonideal components, namely, water, alcohols, alkanes, ketones, aldehydes, esters, and ethers, were investigated to evaluate the perturbed-chain statistical associating fluid theory equation of state (PC-SAFT EOS) and two widely used activity coefficient models, that is, the universal quasichemical (UNIQUAC) and UNIQUAC functional-group activity coefficients (UNIFAC). Parameters used for the PC-SAFT EOS were taken from literature or estimated in this work, while those for UNIQUAC and UNIFAC were from commercial process simulator Aspen plus 8.4. It was found that all the three models yield reliable correlations/predictions for VLE calculations. However, UNIQUAC and UNIFAC were observed to be unreliable for LLE and VLLE calculations despite successful reproductions of experimental data in some cases. The calculated results deviate significantly from experimental data in many cases. Particularly, both models predict artificial liquid–liquid phase splitting for a number of miscible mixtures. Nonetheless, PC-SAFT EOS with the use of a single set of parameters reproduces experimental data quantitatively in most cases and provides reasonably accurate results in all other cases. This remarkable performance of PC-SAFT EOS potentially eliminates the need for various thermodynamic models and consequently the need for selecting a thermodynamic model when performing phase equilibria calculations using commercial software. This is important for practitioners, since (1) it remains unclear to select an appropriate model from the available models of a process simulator or thermodynamic package for a given phase equilibria calculation despite the presence of some type of rule of thumb and (2) it is also likely that none of the existing models is sufficiently accurate. In addition, it was shown that both pure-component parameters and binary interaction parameters for the PC-SAFT EOS are well-behaved for a homologous series, which allows for parametrization for weakly characterized components by interpolation or extrapolation, and consequently, facilitates the development of a practical tool for phase equilibria calculations.
DOI: 10.1371/journal.pone.0138824
2015
Cited 15 times
Correction: OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data
The Data Accession section of this paper is incorrect. The section should say: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are available from Figshare: http://dx.doi.org/10.6084/m9.figshare.1509777.
DOI: 10.1097/md.0000000000012230
2018
Cited 14 times
Cytokine-induced killer cell/dendritic cell–cytokine-induced killer cell immunotherapy for the postoperative treatment of gastric cancer
Immunotherapy is emerging as a new treatment strategy for gastric cancer(GC). However, the efficacy and safety of this technique remain unclear. This meta-analysis aimed to assess the effect of cytokine-induced killer cell (CIK)/dendritic cell-cytokine-induced killer cell (DC-CIK) treatment for GC after surgery.Hazard ratio (HR), overall survival (OS) rates, and disease-free survival (DFS) rates were calculated using a Mantel-Haenszel (M-H) fixed-effects model (FEM), and results were displayed using forest plots. Publication bias was assessed by Begg test, and data were presented using funnel plots. Date robustness was assessed by the trim and fill method. Descriptive analysis was performed on T lymphocytes and adverse effects.In total, 9 trials, including 1216 patients, were eligible for inclusion in this meta-analysis. Compared with the control group, the HR for OS was 0.712 (95% confidence interval [CI] 0.594-0.854) and 0.66 (95% CI 0.546-0.797) for overall (DFS). The risk ratio (RR) of the 3 and 5-year OS rate was 1.29 (95% CI 1.15-1.46) and 1.73 (95% CI 1.36-2.19), respectively. The RR for the 3 and 5-year DFS rate 1.40 (95% CI 1.19-1.65) and 2.10 (95% CI1.53-2.87), respectively. The proportion of patients who were CD3+, CD4+, and CD4+/CD8+ increased in the cellular therapy groups. No fatal adverse reactions were noted.Chemotherapy combined with CIK/DC-CIK therapy after surgery resulted in low HR, and significantly increasing OS rates, DFS rates, and T-lymphocyte responses in patients with GC.
DOI: 10.1186/s12859-020-03859-x
2020
Cited 10 times
Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms
DNBSEQ™ platforms are new massively parallel sequencing (MPS) platforms that use DNA nanoball technology. Use of data generated from DNBSEQ™ platforms to detect single nucleotide variants (SNVs) and small insertions and deletions (indels) has proven to be quite effective, while the feasibility of copy number variants (CNVs) detection is unclear.Here, we first benchmarked different CNV detection tools based on Illumina whole-genome sequencing (WGS) data of NA12878 and then assessed these tools in CNV detection based on DNBSEQ™ sequencing data from the same sample. When the same tool was used, the CNVs detected based on DNBSEQ™ and Illumina data were similar in quantity, length and distribution, while great differences existed within results from different tools and even based on data from a single platform. We further estimated the CNV detection power based on available CNV benchmarks of NA12878 and found similar precision and sensitivity between the DNBSEQ™ and Illumina platforms. We also found higher precision of CNVs shorter than 1 kbp based on DNBSEQ™ platforms than those based on Illumina platforms by using Pindel, DELLY and LUMPY. We carefully compared these two available benchmarks and found a large proportion of specific CNVs between them. Thus, we constructed a more complete CNV benchmark of NA12878 containing 3512 CNV regions.We assessed and benchmarked CNV detections based on WGS with DNBSEQ™ platforms and provide guidelines for future studies.
DOI: 10.1016/j.bsheal.2019.02.001
2019
Cited 9 times
Intra-host Ebola viral adaption during human infection
The onsite next generation sequencing (NGS) of Ebola virus (EBOV) genomes during the 2013–2016 Ebola epidemic in Western Africa provides an opportunity to trace the origin, transmission, and evolution of this virus. Herein, we have diagnosed a cohort of EBOV patients in Sierra Leone in 2015, during the late phase of the outbreak. The surviving EBOV patients had a recovery process characterized by decreasing viremia, fever, and biochemical parameters. EBOV genomes sequenced through the longitudinal blood samples of these patients showed dynamic intra-host substitutions of the virus during acute infection, including the previously described short stretches of 13 serial T>C mutations. Remarkably, within individual patients, samples collected during the early phase of infection possessed Ts at these nucleotide sites, whereas they were replaced by Cs in samples collected in the later phase, suggesting that these short stretches of T>C mutations could emerge independently. In addition, up to a total of 35 nucleotide sites spanning the EBOV genome were mutated coincidently. Our study showed the dynamic intra-host adaptation of EBOV during patient recovery and gave more insight into the complex EBOV-host interactions.
DOI: 10.21203/rs.3.rs-3102575/v1
2023
Optimizing Indoor Air Quality: CFD Simulation and Novel Air Cleaning Methods for Effective Aerosol Particles Inhibition in Public Spaces
Abstract There is no sufficient supply of clean outdoor air to remove viruses quickly in the modern ventilation systems in most buildings, posing a significant health risk. To address this issue, this study utilizes computational fluid dynamics (CFD) simulations to investigate the effectiveness and speed of a locally uniform downward flow field in inhibiting the propagation of aerosol particles. The results indicate that such a flow field is particularly effective in areas with human movement as it facilitates the prompt settling of aerosol particles and significantly reduces their dispersion. By implementing this flow field, the risk of infection from the new coronavirus can be mitigated without increasing energy consumption, especially in high-turnover public spaces like supermarkets. Furthermore, we propose a novel air cleaning device that incorporates shelves and optimize its design using the PSO-SVR algorithm. This optimization achieves an optimal air distribution pattern that mimics the “air rain” effect. These findings offer valuable insights and practical applications for the prevention and control of respiratory diseases, particularly in post-epidemic scenarios.
DOI: 10.1093/gigascience/giy144
2018
Cited 8 times
Erratum to: A reference human genome dataset of the BGISEQ-500 sequencer
DOI: 10.1371/journal.pone.0046211
2012
Cited 7 times
Paired-End Sequencing of Long-Range DNA Fragments for De Novo Assembly of Large, Complex Mammalian Genomes by Direct Intra-Molecule Ligation
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10-20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over 100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.
DOI: 10.3389/fgene.2021.663098
2021
Cited 5 times
Transcriptional Start Site Coverage Analysis in Plasma Cell-Free DNA Reveals Disease Severity and Tissue Specificity of COVID-19 Patients
Symptoms of coronavirus disease 2019 (COVID-19) range from asymptomatic to severe pneumonia and death. A deep understanding of the variation of biological characteristics in severe COVID-19 patients is crucial for the detection of individuals at high risk of critical condition for the clinical management of the disease. Herein, by profiling the gene expression spectrum deduced from DNA coverage in regions surrounding transcriptional start site in plasma cell-free DNA (cfDNA) of COVID-19 patients, we deciphered the altered biological processes in the severe cases and demonstrated the feasibility of cfDNA in measuring the COVID-19 progression. The up- and downregulated genes in the plasma of severe patient were found to be closely related to the biological processes and functions affected by COVID-19 progression. More importantly, with the analysis of transcriptome data of blood cells and lung cells from control group and cases with severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) infection, we revealed that the upregulated genes were predominantly involved in the viral and antiviral activity in blood cells, reflecting the intense viral replication and the active reaction of immune system in the severe patients. Pathway analysis of downregulated genes in plasma DNA and lung cells also demonstrated the diminished adenosine triphosphate synthesis function in lung cells, which was evidenced to correlate with the severe COVID-19 symptoms, such as a cytokine storm and acute respiratory distress. Overall, this study revealed tissue involvement, provided insights into the mechanism of COVID-19 progression, and highlighted the utility of cfDNA as a noninvasive biomarker for disease severity inspections.
DOI: 10.1101/686055
2019
Cited 5 times
PALM-Seq: integrated sequencing of cell-free long RNA and small RNA
Abstract Cell-free RNA, including both long RNA and small RNA, has been considered important for its biological functions and potential clinical usage, but the major challenge is to effectively sequence them at the same time. Here we present PolyAdenylation Ligation Mediated-Seq (PALM-Seq), an integrated sequencing method for cell-free long and small RNA. Through terminal modification and addition of 3’ polyadenylation and 5’ adaptor, we could get mRNA, long non-coding RNA, microRNA, tRNA, piRNA and other RNAs in a single library. With target RNA depletion, all these RNAs could be sequenced with relatively low depth. Using PALM-Seq, we identified pregnant-related mRNAs, long non-coding RNAs and microRNAs in female plasma. We also applied PALM-Seq to sequence RNA from amniotic fluids, leukocytes and placentas, and could find RNA signatures associated with specific sample type. PALM-Seq provides an integrated, cost-effective and simple method to characterize the landscape of cell-free RNA, and can stimulate further progress in cell-free RNA study and usage.
DOI: 10.1016/j.ygeno.2021.09.006
2021
Cited 3 times
Genetic characteristics of human papillomavirus type 16, 18, 52 and 58 in southern China
Persistent infections of high-risk human papillomaviruses (HPVs) are the leading cause of cervical cancers. We collected cervical exfoliated cell samples from females in Changsha city, Hunan Province and obtained 338 viral genomes of four major HPV types, including HPV 16 (n = 82), 18 (n = 35), 52 (n = 121) and 58 (n = 100). The lineage/sublineage distribution of the four HPVs confirmed previous epidemiological reports, with the predominant prevailing sublineage as A4 (50%), A1 (37%) and A3 (13%) for HPV16, A1 (83%) for HPV18, B2 (86%) for HPV52 and A1 (65%), A3 (19%) and A2 (12%) for HPV58. We also identified two potentially novel HPV18 sublineages, i.e. A6 and A7. Virus mutation analysis further revealed the presence of HPV16 and HPV58 sublineages associated with potentially high oncogenicity. These findings expanded our knowledge of the HPV genetic diversity in China, providing valuable evidence to facilitate HPV DNA screening, vaccine effectiveness evaluation and control strategy development.
DOI: 10.1007/s11356-023-30832-x
2023
Optimizing indoor air quality: CFD simulation and novel air cleaning methods for effective aerosol particle inhibition in public spaces
DOI: 10.1093/gigascience/giy151
2018
Erratum to: Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing
DOI: 10.1101/2020.06.08.20124305
2020
Time-series plasma cell-free DNA analysis reveals disease severity of COVID-19 patients
Abstract Clinical symptoms of coronavirus disease 2019 (COVID-19) range from asymptomatic to severe pneumonia and death. Detection of individuals at high risk for critical condition is crucial for control of the disease. Herein, for the first time, we profiled and analyzed plasma cell-free DNA (cfDNA) of mild and severe COVID-19 patients. We found that in comparison between mild and severe COVID-19 patients, Interleukin-37 signaling was one of the most relevant pathways; top significantly altered genes included POTEH, FAM27C, SPATA48, which were mostly expressed in prostate and testis; adrenal glands, small intestines and liver were tissues presenting most differentially expressed genes. Our data thus revealed potential tissue involvement, provided insights into mechanism on COVID-19 progression, and highlighted utility of cfDNA as a noninvasive biomarker for disease severity inspections. One Sentence Summary CfDNA analysis in COVID-19 patients reveals severity-related tissue damage.
DOI: 10.6084/m9.figshare.c.3612557_d2
2016
Additional file 4: Table S2. of cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs
List of novel miRNA candidates. (XLSX 6531Â kb)
DOI: 10.6084/m9.figshare.c.3612557_d8
2016
Additional file 3: Table S1. of cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs
Comparison of BGISEQ-500 to Agilent. (XLSX 135Â kb)
DOI: 10.6084/m9.figshare.c.3612557_d3
2016
Additional file 1: Table S3. of cPAS-based sequencing on the BGISEQ-500 to explore small non-coding RNAs
miRNA read count of the BGISEQ-500. (XLSX 250Â kb)
DOI: 10.6084/m9.figshare.1509877
2015
OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data
We've included here the scripts of OTG-snpcaller, which was published previously (DOI:10.1371/journal.pone.0097507).
DOI: 10.1109/geoinformatics.2010.5567551
2010
Establishment and application of the virtual stope/ dumping sites model of the strip mine
By using the Geographic Information System (GIS) technology, it is proposed to construct a virtual stope model method, which can reflect mine engineering development status and satisfy geometry restraint relationship, for the technical works, such as the exploitation design, the establishment of production plans and production management in strip mines. With the utilization of the based time series virtual stope/ dumping sites models of strip mines, establishing the spatiotemporal database model of stopes and dynamically describing the process of the strip mine production planning including shelling excavating and casting, it can provide a full range of spatial information support for the optimal pre-control and decision-making of the plan, the design and the production scheduling management of strip mines. Taking the establishment and application of the virtual stope/ dumping sites models of one strip mine for an example, the paper presents the method that establishes the virtual stope/ dumping sites models of strip mines. By using the spatio-temporal data model of the virtual stope/ dumping sites models of strip mines, and GIS and Virtual Reality (VR) technology, it achieves the exploitation-process playback and production-plan ahead demonstration, establishes the framework and platform of basic geographic-spatial information for the digital open-pit construction and provides decision-making for mining.
DOI: 10.3760/cma.j.issn.1007-8118.2017.09.017
2017
Research updates on surgical treatments for portal hypertension
Portal hypertension is a common clinical syndrome in chronic liver disease, such as schistosomiasis, portal vein occlusion cirrhosis and so on, which can be diagnosed when the hepatic venous pressure gradient is (HVPG) > 5 mmHg(1 mmHg=0.133 kPa). It could lead to gastroesophageal varicose veins rupture, ascites, spontaneous bacterial peritonitis, hepatorenal syndrome, hepatopulmonary syndrome, hepatic encephalopathy and some other serious complications, and is the primary death cause in cirrhosis and liver transplantation. The development of portal hypertension has experienced 4 phases including the research about portal hypertension related theories and animal trial phase, preclinical tests and data accumulation phase , devascularization and shunts rapid development phase, the development phase of new technologies such as interventional and endoscopic surgical treatment, liver transplantation since the middle of the 19th century. The surgical procedures have been modified, which greatly reduce the complication and improve the life quality after operation. But so far none of them can cure portal hypertension thoroughly. This paper not only introduces the pathophysiologic basis of the surgical treatment, but also reviews the history of its development to summarize the recent progress, which may facilitate its surgical treatment. Key words: Portal hypertension; Surgical treatment; Shunt; Disconnection; Transjugular intrahepatic portosystemic shunt (TIPS); Liver transplantation
DOI: 10.2139/ssrn.4195411
2022
Experimental and Theoretical Study of the Adsorption of Mixed Low Carbon Alcohol in Fischer Tropsch Synthesis Wastewater by Activated Carbon
To purify and recycle the low-carbon alcohols in wastewater by a green method with high efficiency is of great significance in chemical engineering process, such as indirect coal liquefaction, which produces a large amount of alcohol-containing wastewater. Adsorption-desorption of low-carbon alcohols in wastewater by activated carbon is a promising method for industrial application. The adsorption capacity of low-carbon alcohols in activated carbons is closely related with their pore and surface properties. However, this was seldom studied due to the complex properties of carbon materials with different pore size, distribution and volume, as well as surface area and hydrophobicity. In this work, four commercial activated carbon materials (AC1~4) with similar hydrophobicity were chosen to explore the possible correlation of the adsorption capacity of C1-5 and mixed low-carbon alcohols with the pore and surface area. Both the experimental and theoretical results indicate that the adsorption capacities of methanol and ethanol are strongly dependent on the ultra-micropores (<0.9 nm) due to their small molecular weight; and the adsorption capacity of pentanol is mainly determined by the surface areas due to its relatively high molecular weight and low polarity; while the adsorption capacities of propanol and butanol are well correlated with the microporous volume and surface area. AC2 has the highest adsorption capacity of mixed low-carbon alcohols (165.6 mg/g) in Fischer Tropsch Synthesis (FTS) modeling wastewater due to its abundant ultra-micropore and high specific surface area. This work sheds light into the development of adsorbents with high adsorption capacity of low-carbon alcohols.
DOI: 10.1101/786962
2019
Detection and characterization of copy number variants based on whole-genome sequencing by DNBSEQ platforms
Abstract Background Next-generation sequence (NGS) has rapidly developed in past years which makes whole-genome sequencing (WGS) becoming a more cost- and time-efficient choice in wide range of biological researches. We usually focus on some variant detection via WGS data, such as detection of single nucleotide polymorphism (SNP), insertion and deletion (Indel) and copy number variant (CNV), which playing an important role in many human diseases. However, the feasibility of CNV detection based on WGS by DNBSEQ™ platforms was unclear. We systematically analysed the genome-wide CNV detection power of DNBSEQ™ platforms and Illumina platforms on NA12878 with five commonly used tools, respectively. Results DNBSEQ™ platforms showed stable ability to detect slighter more CNVs on genome-wide (average 1.24-fold than Illumina platforms). Then, CNVs based on DNBSEQ™ platforms and Illumina platforms were evaluated with two public benchmarks of NA12878, respectively. DNBSEQ™ and Illumina platforms showed similar sensitivities and precisions on both two benchmarks. Further, the difference between tools for CNV detection was analyzed, and indicated the selection of tool for CNV detection could affected the CNV performance, such as count, distribution, sensitivity and precision. Conclusion The major contribution of this paper is providing a comprehensive guide for CNV detection based on WGS by DNBSEQ™ platforms for the first time.
DOI: 10.1101/2020.07.30.229112
2020
Genetic signatures for lineage/sublineage classification of HPV16, 18, 52 and 58 variants
Increasing evidences indicate that high-risk HPV variants are heterogeneous in carcinogenicity and ethnic dispersion. In this work, we identified genetic signatures for convenient determination of lineage/sublineage of HPV16, 18, 52 and 58 variants. Using publicly available genomes, we found that E2 of HPV16, L2 of HPV18, L1 and LCR of HPV52, and L2, LCR and E1 of HPV58 contain the proper genetic signature for lineage/sublineage classification. Sets of hierarchical signature nucleotide positions (SNPs) were further confirmed for high accuracy (&gt;98%) by classifying HPV genomes obtained from Chinese females, which included 117 HPV16 variants, 48 HPV18 variants 117 HPV52 variants and 89 HPV58 variants. The circulation of HPV variants posing higher cancer risk in Eastern China, such as HPV16 A4 and HPV58 A3, calls for continuous surveillance in this region. The marker genes and signature nucleotide positions may facilitate cost-effective diagnostic detections of HPV variants in clinical settings.
DOI: 10.1101/2020.12.04.406579
2020
Gut microbiome couples gut and brain during calorie restriction in treating obesity
Abstract Calorie restriction (CR) has been widely recognized for its effect in reducing body weight and alleviating diabetes in humans, as well as prolonging life span in animal studies. Gut microbiome shifts contribute to part of the effects of CR, but little is known regarding their influences except on metabolism and immunity. Here we monitored gut microbiome using metagenomics and metatranscriptomics in obese individuals undergoing CR, and revealed microbial determinants that could contribute to successful weight loss. Microbiome changes are linked to changes in blood metabolome and hormones, which eventually correlate to brain functional changes as studied using functional magnetic resonance imaging (fMRI). Brain functional shifts indicate response of central neural system (CNS) to CR, and microbiome constitutes the keystone of gut-brain axis. Animal experiment further reaffirms the gut microbiome changes, metabolic and hormonal shifts of CR, while proteomic analysis of brain tissues suggest that epigenetic modifications of key proteins could explain responses of CNS to CR. Our study establishes linkage between CR, gut microbiome, metabolome/ hormones and CNS function, and demonstrates that CR has multi-facet, coordinated effects on the host, of which many could contribute to weight loss and other beneficial effects.
DOI: 10.6084/m9.figshare.13225712
2020
Additional file 1 of Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms
Additional file 1. Table S1: Whole-genome sequencing data of NA12878 based on four sequencing platforms. Table S2: Number and length statistics of all 50 CNV sets. Table S3: Regional distribution of all 50 CNV sets across the genome. Table S4: List of the complete CNV benchmark of NA12878. Table S5: CNV detection tools using whole-genome sequencing data.
DOI: 10.1101/2021.04.27.438890
2021
Genetic characteristics of human papillomavirus type 16, 18, 52 and 58 in southern China
Abstract Persistent infections of high-risk human papillomaviruses (HPVs) are the leading cause of cervical cancers. We collected cervical exfoliated cell samples from females in Changsha city, Hunan Province and obtained 358 viral genomes of four major HPV types, including HPV 16 (n=82), 18 (n=35), 52 (n=121) and 58 (n=100). The lineage/sublineage distribution of the four HPVs confirmed previous epidemiological reports, with the predominant prevailing sublineage as A4 (50%), A1 (37%) and A3 (13%) for HPV16, A1 (83%) for HPV18, B2 (86%) for HPV52 and A1 (65%), A3 (19%) and A2 (12%) for HPV58. We also identified two potentially novel HPV18 sublineages, i.e. A6 and A7. Virus mutation analysis further revealed the presence of HPV16 and HPV58 strains associated with potentially high oncogenicity. These findings expanded our knowledge on the HPV genetic diversity in China, providing valuable evidence to facilitate HPV DNA screening, vaccine effectiveness evaluation and control strategy development.
DOI: 10.1101/2021.12.26.474219
2021
Influence of sequencing depth on the fidelity and sensitivity of 1%-5% low-frequency mutation detection and recommendation for standardization of sequencing depth
Abstract Sequencing depth has always played an important role in the accurate detection of low-frequency mutations. The increase of sequencing depth and the reasonable setting of threshold can maximize the probability of true positive mutation, or sensitivity. Here, we found that when the threshold was set as a fixed number of positive mutated reads, the probability of both true and false-positive mutations increased with depth. However, When the number of positive mutated reads increased in an equal proportion with depth (the threshold was transformed from a fixed number to a fixed percentage of mutated reads), the true positive probability still increased while false positive probability decreased. Through binomial distribution simulation and experimental test, it is found that the “fidelity” of detected-VAFs is the cause of this phenomenon. Firstly, we used the binomial distribution to construct a model that can easily calculate the relationship between sequencing depth and probability of true positive (or false positive), which can standardize the minimum sequencing depth for different low-frequency mutation detection. Then, the effect of sequencing depth on the fidelity of NA12878 with 3% mutation frequency and circulating tumor DNA (ctDNA of 1%, 3% and 5%) showed that the increase of sequencing depth reduced the fluctuation range of detected-VAFs around the expected VAFs, that is, the fidelity was improved. Finally, based on our experiment result, the consistency of single-nucleotide variants (SNVs) between paired FF and FFPE samples of mice increased with increasing depth, suggesting that increasing depth can improve the precision and sensitivity of low-frequency mutations. Highlights The normalized relationship between sequencing depth and the probability of true positive mutation (sensitivity) is established based on binomial distribution. The probability of true positive increases and the probability of false positive decreases when the number of positive mutated reads increases (threshold) in an equal proportion with depth. Detected-VAFs fluctuates regularly around expected-VAFs. The amplitude of detected-VAFs fluctuation decreases with sequencing depth and the “fidelity” increases. The increase of “fidelity” leads to a higher degree of differentiation between true and false positive mutations, which ultimately increases the true positive probability and decreases the false positive probability.