ϟ

J. Lynn Fink

Here are all the papers by J. Lynn Fink that you can download and read on OA.mg.
J. Lynn Fink’s last known institution is . Download J. Lynn Fink PDFs here.

Claim this Profile →
DOI: 10.1038/nature16965
2016
Cited 2,679 times
Genomic analyses identify molecular subtypes of pancreatic cancer
DOI: 10.1038/nature14169
2015
Cited 2,113 times
Whole genomes redefine the mutational landscape of pancreatic cancer
Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded.
DOI: 10.1038/nature11547
2012
Cited 1,767 times
Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes
Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.
DOI: 10.1038/nature14410
2015
Cited 1,213 times
Whole–genome characterization of chemoresistant ovarian cancer
DOI: 10.1038/nature21063
2017
Cited 716 times
Whole-genome landscape of pancreatic neuroendocrine tumours
The diagnosis of pancreatic neuroendocrine tumours (PanNETs) is increasing owing to more sensitive detection methods, and this increase is creating challenges for clinical management. We performed whole-genome sequencing of 102 primary PanNETs and defined the genomic events that characterize their pathogenesis. Here we describe the mutational signatures they harbour, including a deficiency in G:C > T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase. Clinically sporadic PanNETs contain a larger-than-expected proportion of germline mutations, including previously unreported mutations in the DNA repair genes MUTYH, CHEK2 and BRCA2. Together with mutations in MEN1 and VHL, these mutations occur in 17% of patients. Somatic mutations, including point mutations and gene fusions, were commonly found in genes involved in four main pathways: chromatin remodelling, DNA damage repair, activation of mTOR signalling (including previously undescribed EWSR1 gene fusions), and telomere maintenance. In addition, our gene expression analyses identified a subgroup of tumours associated with hypoxia and HIF signalling. The genomes of 102 primary pancreatic neuroendocrine tumours have been sequenced, revealing mutations in genes with functions such as chromatin remodelling, DNA damage repair, mTOR activation and telomere maintenance, and a greater-than-expected contribution from germ line mutations. Pancreatic neuroendocrine tumours (PanNETs) are the second most common epithelial neoplasm of the pancreas. Aldo Scarpa, Sean Grimmond and colleagues report whole-genome sequencing of 102 primary PanNETs and present analysis of their mutational signatures as part of the International Cancer Genome Consortium. They find frequent mutations in genes with functions that include chromatin remodelling, DNA damage repair, activation of mTOR signalling, and telomere maintenance. They also identify mutational signatures, including one resulting from inactivation of the DNA repair gene MUTYH, and report a larger than expected germline contribution to PanNET development.
DOI: 10.1038/ng.375
2009
Cited 408 times
The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line
The FANTOM4 study identified transcriptional start sites active during proliferation arrest and differentiation of the human monocytic cell line THP-1. Systematic knockdown of 52 transcription factors provide support for their model in which a complex transcriptional network regulates the differentiation process. Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process.
DOI: 10.1002/ijc.28765
2014
Cited 187 times
Genome‐wide DNA methylation patterns in pancreatic ductal adenocarcinoma reveal epigenetic deregulation of SLIT‐ROBO, ITGA2 and MET signaling
The importance of epigenetic modifications such as DNA methylation in tumorigenesis is increasingly being appreciated. To define the genome‐wide pattern of DNA methylation in pancreatic ductal adenocarcinomas (PDAC), we captured the methylation profiles of 167 untreated resected PDACs and compared them to a panel of 29 adjacent nontransformed pancreata using high‐density arrays. A total of 11,634 CpG sites associated with 3,522 genes were significantly differentially methylated (DM) in PDAC and were capable of segregating PDAC from non‐malignant pancreas, regardless of tumor cellularity. As expected, PDAC hypermethylation was most prevalent in the 5′ region of genes (including the proximal promoter, 5′UTR and CpG islands). Approximately 33% DM genes showed significant inverse correlation with mRNA expression levels. Pathway analysis revealed an enrichment of aberrantly methylated genes involved in key molecular mechanisms important to PDAC: TGF‐β, WNT, integrin signaling, cell adhesion, stellate cell activation and axon guidance. Given the recent discovery that SLIT‐ROBO mutations play a clinically important role in PDAC, the role of epigenetic perturbation of axon guidance was pursued in more detail. Bisulfite amplicon deep sequencing and qRT‐PCR expression analyses confirmed recurrent perturbation of axon guidance pathway genes SLIT2, SLIT3, ROBO1, ROBO3, ITGA2 and MET and suggests epigenetic suppression of SLIT‐ROBO signaling and up‐regulation of MET and ITGA2 expression. Hypomethylation of MET and ITGA2 correlated with high gene expression, which was associated with poor survival. These data suggest that aberrant methylation plays an important role in pancreatic carcinogenesis affecting core signaling pathways with potential implications for the disease pathophysiology and therapy.
DOI: 10.1053/j.gastro.2016.09.060
2017
Cited 173 times
Hypermutation In Pancreatic Cancer
Pancreatic cancer is molecularly diverse, with few effective therapies. Increased mutation burden and defective DNA repair are associated with response to immune checkpoint inhibitors in several other cancer types. We interrogated 385 pancreatic cancer genomes to define hypermutation and its causes. Mutational signatures inferring defects in DNA repair were enriched in those with the highest mutation burdens. Mismatch repair deficiency was identified in 1% of tumors harboring different mechanisms of somatic inactivation of MLH1 and MSH2. Defining mutation load in individual pancreatic cancers and the optimal assay for patient selection may inform clinical trial design for immunotherapy in pancreatic cancer. Pancreatic cancer is molecularly diverse, with few effective therapies. Increased mutation burden and defective DNA repair are associated with response to immune checkpoint inhibitors in several other cancer types. We interrogated 385 pancreatic cancer genomes to define hypermutation and its causes. Mutational signatures inferring defects in DNA repair were enriched in those with the highest mutation burdens. Mismatch repair deficiency was identified in 1% of tumors harboring different mechanisms of somatic inactivation of MLH1 and MSH2. Defining mutation load in individual pancreatic cancers and the optimal assay for patient selection may inform clinical trial design for immunotherapy in pancreatic cancer. Pancreatic ductal adenocarcinoma has a 5-year survival of <5%, with therapies offering only incremental benefit,1Vogelzang N.J. et al.J Clin Oncol. 2012; 30: 88-109Crossref PubMed Scopus (85) Google Scholar potentially due to the diversity of its genomic landscape.2Bailey P. et al.Nature. 2016; 531: 47-52Crossref PubMed Scopus (1973) Google Scholar, 3Biankin A.V. et al.Nature. 2012; 491: 399-405Crossref PubMed Scopus (1379) Google Scholar, 4Waddell N. et al.Nature. 2015; 518: 495-501Crossref PubMed Scopus (1466) Google Scholar Recent reports link high mutation burden with response to immune checkpoint inhibitors in several cancer types.5Le D.T. et al.N Engl J Med. 2015; 372: 2509-2520Crossref PubMed Scopus (6099) Google Scholar Defining tumors that are hypermutated with an increased mutation burden and understanding the underlying mechanisms in pancreatic cancer has the potential to advance therapeutic development, particularly for immunotherapeutic strategies. Whole genome sequencing (WGS, n = 180) and whole exome sequencing (n = 205) of 385 unselected predominantly sporadic pancreatic ductal adenocarcinoma (Supplementary Table 1) defined a mean mutation load of 1.8 and 1.1 mutation per megabase (Mb), respectively (Supplementary Table 2). Outlier analysis identified 20 tumors with the highest mutation burden (5.2%, 15 WGS and 5 exome) (Table 1 and Supplementary Figure 1A), 5 of which were considered extreme outliers and classified as hypermutated as they contained ≥12 somatic mutations/Mb, the defined threshold for hypermutation in colorectal cancer.6Cancer Genome Atlas NetworkNature. 2012; 487: 330-337Crossref PubMed Scopus (5894) Google Scholar Immunohistochemistry for mismatch repair (MMR) proteins (MSH2, MSH6, MLH1, and PMS2) identified 4 MMR-deficient tumors, all of which were hypermutated (n = 180, Figure 1).Table 1Clinical and Histologic Features and Proposed Etiology for Highly Mutated Pancreatic Ductal Adenocarcinoma Tumors (n = 20)Sample IDPersonal and family history of malignancyHistologyMutation load, mutations/MbIHC resultMSIsensor scoreKRAS mutationPredominant mutation signature (mutations/Mb)SV subtype (no. of events)Proposed etiologyHypermutation (extreme outliers) ICGC_0076aSample sequenced by WGS, other samples by exome sequencing.NoneMixed signet ring, mucinous and papillary adenocarcinoma38.55Absent MLH1 and PMS228.3p.G12VMMR (18.3)Scattered (131)MMR deficiency: >280 kb somatic homozygous deletion over MSH2. ICGC_0297aSample sequenced by WGS, other samples by exome sequencing.NoneUndifferentiated adenocarcinoma60.62Absent MSH2 and MSH627.33WTMMR (33.4)Scattered (75)MMR deficiency: Somatic MLH1 promoter hypermethylation. ICGC_0548aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, moderately differentiated30.13Absent MSH2 and MSH617.47WTMMR (16.6)Stable (49)MMR deficiency: >27 kb somatic inversion rearrangement disrupting MSH2. ICGC_0328aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma16.63Normal3.2p.G12DUnknown (11.9)Scattered (110)Cell line with signature: etiology unknown. ICGC_00901 FDR, father CRCDuctal adenocarcinoma, moderately differentiated12.9Absent MSH2 and MSH60.21p.G12CNANAMMR deficiency: somatic MSH2 splice site c.2006G>A.Highly mutated tumors ICGC_0054aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, poorly differentiated6.52Normal0.01p.G12VHR deficiency (1.3)Unstable (310)HR deficiency: no germline or somatic cause found. ICGC_0290aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, poorly differentiated6.54Not available0.07p.G12VHR deficiency (3.1)Unstable (558)HR deficiency: Germline BRCA2 mutation c.7180A>T, p.A2394*. Somatic CN-LOH. ICGC_0215aSample sequenced by WGS, other samples by exome sequencing.2 FDR lung cancer, 2 FDR prostate cancer. Previous CRC and melanomaDuctal adenocarcinoma, moderately differentiated6.27Normal0.01p.G12VHR deficiency (1.9)Scattered (111)HR deficiency: Germline ATM mutation c.7539_7540delAT, p.Y2514*. Somatic CN-LOH. ICGC_0324NoneDuctal adenocarcinoma, moderately differentiated6.24Normal0p.G12DNANAUndefined ICGC_0034aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, poorly differentiated6.09Normal4.02p.G12DHR deficiency (3.4)Unstable (366)HR deficiency: Germline BRCA2 mutation c.5237_5238insT, p.N1747*. Somatic CN-LOH. ICGC_0131aSample sequenced by WGS, other samples by exome sequencing.Lung cancer after PCDuctal adenocarcinoma, moderately differentiated5.63Normal0p.G12DT>G at TT sites (3.0)Focal (147)T>G at TT sites signature: etiology potentially associated with DNA oxidation ICGC_0006aSample sequenced by WGS, other samples by exome sequencing.1 FDR, father lung cancerAdenocarcinoma arising from IPMN, moderately differentiated5.29Normal0.01p.G12DHR deficiency (1.2)Unstable (211)HR deficiency: Somatic BRCA2 c.5351dupA, p.N1784KfsTer3. Somatic CN-LOH. ICGC_0321aSample sequenced by WGS, other samples by exome sequencing.2 FDR, mother and cousin breast cancerDuctal adenocarcinoma, poorly differentiated4.79Not available0p.G12DHR deficiency (2.1)Unstable (286)HR deficiency: Germline BRCA2 c.6699delT, p.F2234LfsTer7. Somatic CN loss- 1 copy. ICGC_0309aSample sequenced by WGS, other samples by exome sequencing.NoneAdenocarcinoma arising from IPMN, moderately differentiated4.74Normal0.03p.G12VT>G at TT sites (3.1)Unstable (232)T>G at TT sites signature: etiology potentially associated with DNA oxidation ICGC_0005aSample sequenced by WGS, other samples by exome sequencing.1 FDR, mother CRCDuctal adenocarcinoma, poorly differentiated4.72Not available1p.G12VHR deficiency (1.1)Focal (95)HR deficiency: No germline or somatic cause found. ICGC_0016aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, poorly differentiated4.61Normal3.03p.G12VHR deficiency (1.7)Unstable (447)HR deficiency: potentially linked to Somatic RPA1 c.273G>T, p.R91S ICGC_00461 FDR, brother PCDuctal adenocarcinoma, poorly differentiated4.3Normal0p.Q61HNANAUndefined GARV_0668aSample sequenced by WGS, other samples by exome sequencing.NoneDuctal adenocarcinoma, poorly differentiated4.3Not available2.19p.G12VHR deficiency (1.6)Unstable (464)HR deficiency: Germline BRCA2 c.7068_7069delTC, p.L2357VfsTer2. Somatic CN loss - 1 copy. ICGC_0291NoneDuctal adenocarcinoma, well differentiated3.84Not available0.03p.G12RNANAHR deficiency: Somatic BRCA2 c.7283T>A, p.L2428*. ICGC_0256NoneDuctal adenocarcinoma, poorly differentiated3.72Not available0.06p.G12DNANAUndefinedCRC, colorectal cancer; FDR, first-degree relative; IHC, immunohistochemistry; IPMN, intraductal papillary mucinous neoplasm; CN-LOH, copy neutral loss of heterozygosity; CN, copy number; PC, pancreatic cancer; NA, not applicable to exome data.a Sample sequenced by WGS, other samples by exome sequencing. Open table in a new tab CRC, colorectal cancer; FDR, first-degree relative; IHC, immunohistochemistry; IPMN, intraductal papillary mucinous neoplasm; CN-LOH, copy neutral loss of heterozygosity; CN, copy number; PC, pancreatic cancer; NA, not applicable to exome data. KRAS mutation status and histopathologic characteristics have been associated with MMR-deficient pancreatic tumors.7Goggins M. et al.Am J Pathol. 1998; 152: 1501-1507PubMed Google Scholar Of the 4 MMR-deficient tumors in our cohort, 2 were KRAS wild-type; 3 had undifferentiated to moderately differentiated histology and one had a signet-ring component. These features were not predictive of MMR deficiency in our cohort, as 11 additional non−MMR-deficient tumors had a signet-ring cell component or colloid morphology, and 131 of 347 assessable tumors had poorly or undifferentiated histology. Mutational signature analysis can detect MMR deficiency indirectly based on the pattern of somatic mutations.8Alexandrov L.B. et al.Nature. 2013; 500: 415-421Crossref PubMed Scopus (6213) Google Scholar An MMR-deficient signature dominated the MMR-deficient tumors (with WGS), and was minimal in MMR intact tumors (Supplementary Figure 1). In addition, microsatellite instability (MSI), a hallmark of MMR deficiency in colorectal cancer, was detected in all three MMR deficient tumors with WGS using MSIsensor9Niu B. Ye K. et al.Bioinformatics. 2014; 30: 1015-1016Crossref PubMed Scopus (294) Google Scholar (Supplementary Table 2). MSI was not identified for the fourth MMR deficient sample potentially due to the reduced number of microsatellite loci in exome data. The underlying causes of MMR deficiency in the 4 cases were private somatic events. For 2 cases, MSH2 was disrupted by different structural rearrangements, 1 case contained a missense MSH2 mutation and the last, methylation of the MLH1 promoter (Figure 1). The missense mutation caused an MSH2 splice acceptor site mutation that alters the same nucleotide results in a pathogenic skipping of exon 13 in germline studies.10Thompson B.A. et al.Nat Genet. 2014; 46: 107-115Crossref PubMed Scopus (346) Google Scholar Hypermethylation of the MLH1 promoter is the predominant mechanism of MSI in sporadic colon cancer.11Boland C.R. et al.Gastroenterology. 2010; 138: 2073-2087 e3Abstract Full Text Full Text PDF PubMed Scopus (1359) Google Scholar The remaining hypermutated tumor contained an intact MMR pathway, and was a cell line (ATCC, CRL-2551) with an unidentified mutational signature, therefore the high mutation burden in this sample may be the result of long-term cell culture. The 15 samples (11 WGS and 4 exome) identified in the outlier analysis with high mutation burden, but not hypermutated (∼4 to 12 mutations/Mb) contained no evidence of MMR deficiency. Mutational signature analysis of the WGS samples indicated homologous recombination (HR) repair deficiency as the most substantial (range, 1.0–3.4 mutations/Mb) contributor to the mutation burden for 8 WGS mutation load outlier tumors. In support of a HR defect4Waddell N. et al.Nature. 2015; 518: 495-501Crossref PubMed Scopus (1466) Google Scholar; 7 of these tumors contained high levels of genomic instability with >200 structural variants and mutations in genes involved in HR were present for 6 of 8 cases (Supplementary Table 2). In addition, 1 case that had undergone exome sequencing had a somatic BRCA2 nonsense mutation that likely contributed to HR deficiency in this case. A mutational signature associated with T>G mutations at TT sites previously described in other cancers, including esophageal cancer12Nones K. Waddell N. Wayte N. et al.Nat Commun. 2014; : 5Google Scholar was the major contributor (>3 mutations/Mb) in 2 samples. For these 2 and the remaining 4 cases, no potential causative event could be identified. Although germline defects in MMR genes are well reported in pancreatic cancer13Grant R.C. Selander I. et al.Gastroenterology. 2015; 148: 556-564Abstract Full Text Full Text PDF PubMed Scopus (211) Google Scholar in our cohort, they did not contribute to MMR deficiency even in those with familial pancreatic cancer or a personal or family history of Lynch-related tumors. A germline truncating variant was detected in PMS2 in 1 case, but did not have loss of the second allele, had normal immunohistochemistry staining and did not display a MMR mutational signature (Supplementary Table 2). MMR deficiency is important in the evolution in a small, but meaningful proportion of pancreatic cancers with a prevalence of 1% (4 of 385) in our cohort. This is consistent with recent studies using the Bethesda polymerase chain reaction panel,14Laghi L. et al.PLoS One. 2012; 7: e46002Crossref PubMed Scopus (55) Google Scholar and with previous estimates of MSI prevalence of 2%−3%.15Nakata B. et al.Clin Cancer Res. 2002; 8: 2536-2540PubMed Google Scholar However, in tumors with low epithelial content that underwent exome sequencing, the sensitivity of somatic mutation detection is reduced, which will affect mutation burden and signature analysis. While cognizant of small numbers, immunohistochemistry was the most accurate in defining MMR due to multiple genomic mechanisms of MMR gene inactivation. Multiple methods to define MMR deficiency may be required for clinical trials that aim to recruit MMR-deficient participants to assess the potential efficacy of checkpoint inhibitors or other therapies in pancreatic cancer. Homologous recombination-deficient tumors, and those with a novel signature seen in esophageal cancer had an increased mutation burden, and need further evaluation as potential patient selection markers for clinical trials of checkpoint inhibitor and other therapies that target tumors with a high mutation burden. The authors would like to thank Cathy Axford, Deborah Gwynne, Mary-Anne Brancato, Clare Watson, Michelle Thomas, Gerard Hammond, and Doug Stetner for central coordination of the Australian Pancreatic Cancer Genome Initiative, data management, and quality control; Mona Martyn-Smith, Lisa Braatvedt, Henry Tang, Virginia Papangelis, and Maria Beilin for biospecimen acquisition; and Sonia Grimaldi and Giada Bonizzato of the ARC-Net Biobank for biospecimen acquisition. For a full list of contributors see Australian Pancreatic Cancer Genome Initiative: http://www.pancreaticcancer.net.au/apgi/collaborators. The cohort consisted of 385 patients with histologically verified pancreatic exocrine carcinoma, prospectively recruited between 2006 and 2013 through the Australian Pancreatic Cancer Genome Initiative (www.pancreaticcancer.net.au) as part of the International Cancer Genome Consortium.1Hudson T.J. et al.Nature. 2010; 464: 993-998Crossref PubMed Scopus (1689) Google Scholar Ethical approval was granted at all treating institutions and individual patients provided informed consent upon entry to the study. The clinicopathologic information for the cohort is described in (Supplementary Table 1), and the global mutation profile has previously been reported for some of these tumors (Supplementary Table 2). Tumor and normal DNA were extracted after histologic review from fresh frozen tissue samples collected at the time of surgical resection or biopsy, as described previously.2Biankin A.V. et al.Nature. 2012; 491: 399-405Crossref PubMed Scopus (1513) Google Scholar Tumor cellularity was determined from single-nucleotide polymorphism array data using qpure.3Song S. et al.PLoS One. 2012; 7: e45835Crossref PubMed Scopus (85) Google Scholar Tumors with epithelial content ≥40% underwent WGS lower cellularity tumors underwent whole exome sequencing. DNA from patient-derived pancreas cell lines and matched normal was also extracted. Exome and WGS were performed using paired 100-bp reads on the Illumina HiSeq 2000, as described previously.2Biankin A.V. et al.Nature. 2012; 491: 399-405Crossref PubMed Scopus (1513) Google Scholar, 4Waddell N. et al.Nature. 2015; 518: 495-501Crossref PubMed Scopus (1686) Google Scholar Regions of germline and somatic copy number change were detected using Illumina SNP BeadChips with GAP.5Popova T. et al.Genome Biol. 2009; 10 (R128−R128)Crossref PubMed Scopus (151) Google Scholar Somatic structural variants were identified from WGS reads using the qSV tool.4Waddell N. et al.Nature. 2015; 518: 495-501Crossref PubMed Scopus (1686) Google Scholar, 6Patch A.M. et al.Nature. 2015; 521: 489-494Crossref PubMed Scopus (930) Google Scholar Single nucleotide variants were called using 2 variant callers: qSNP7Kassahn K.S. et al.PLoS One. 2013; 8: e74380Crossref PubMed Scopus (52) Google Scholar and GATK.8McKenna A. et al.Genome Res. 2010; 20: 1297-1303Crossref PubMed Scopus (14755) Google Scholar Mutations identified by both callers or, those that were unique to a caller but verified by an orthogonal sequencing approach, were considered high confidence and used in all subsequent analyses. Small indels (<200 bp) were identified using Pindel9Ye K. et al.Bioinformatics. 2009; 25: 2865-2871Crossref PubMed Scopus (1391) Google Scholar and each indel was visually inspected in the Integrative Genome Browser. The distribution of the total number of small somatic mutations (coding and noncoding single nucleotide and indel variants) identified per megabase for exome and WGS sequence data were analyzed separately. The group of samples with high mutation load, at the top of each distribution, were defined as the upper distribution outliers for mutations per megabase, that is, ≥75th centile + (1.5× interquartile range). The threshold for detecting outliers in the exome and WGS groups was 3.4 and 4.2 mutations/Mb, respectively. From within the highly mutated set of tumors, hypermutated samples were identified as those with a mutation rate exceeding the thresholds for extreme distribution outliers (≥75th centile + [5× interquartile range]) of 7.4 and 8.1 mutations/Mb for exome and WGS sequencing, respectively. MSIsensor was used to detect microsatellite instability by directly comparing microsatellite repeat lengths between paired normal and tumor sequencing data.10Niu B. et al.Bioinformatics. 2014; 30: 1015-1016Crossref PubMed Scopus (378) Google Scholar A MSIsensor score of >3.5% of somatic microsatellites with repeat length shifts was the detection threshold used to indicate microsatellite instability as published for endometrial cancer.10Niu B. et al.Bioinformatics. 2014; 30: 1015-1016Crossref PubMed Scopus (378) Google Scholar This correlated well with the 5 and 7 microsatellite panels recommended in the Bethesda guidelines.10Niu B. et al.Bioinformatics. 2014; 30: 1015-1016Crossref PubMed Scopus (378) Google Scholar, 11Umar A. et al.J Natl Cancer Inst. 2004; 96: 261-268Crossref PubMed Scopus (2461) Google Scholar Tissue microarrays were constructed using at least three 1-mm formalin-fixed, paraffin-embedded tumor cores. Immunohistochemistry for MSH6 and PMS2 proteins was performed on tissue microarray sections as a screen for MMR deficiency due to MMR proteins forming heterodimers with concordant mismatch repair loss (ie, loss of MLH1 and PMS2 or loss of MSH2 and MSH6).12Hall G. et al.Pathology. 2010; 42: 409-413Abstract Full Text PDF PubMed Scopus (98) Google Scholar Immunohistochemistry on full tumor sections for MSH2, MLH1, MSH6, and PMS2 was performed in those with abnormal staining in core sections. The immunohistochemistry was performed as described previously12Hall G. et al.Pathology. 2010; 42: 409-413Abstract Full Text PDF PubMed Scopus (98) Google Scholar and scored by a senior pathologist. Somatic mutational signatures were extracted from the whole genome sequenced samples using the framework described previously.13Alexandrov L.B. et al.Cell Rep. 2013; 3: 246-259Abstract Full Text Full Text PDF PubMed Scopus (734) Google Scholar High confidence somatic substitutions were classified by the substitution change and sequence context, that is, the type of immediately neighboring bases to the variant. The framework processes the counts of somatic mutations at each context within each sample using non-negative factorization to produce the different signature profiles that are present in the data. The profiles identified were matched against reported signatures from the Cancer of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk/cosmic/signatures). The major contributory signatures, defined as the mutational signature with the highest number of contributing somatic substitution variants, is reported for highly mutated whole genome samples. Bisulfite-converted whole-genome amplified DNA was hybridized to Infinium Human Methylation 450K Beadchips according to the manufacturers protocol (Illumina). Methylation arrays were performed on DNA from 174 pancreatic ductal adenocarcinoma samples, which were compared to DNA from 29 adjacent nonmalignant pancreata. A subset of the methylation data has been published previously.14Nones K. et al.Int J Cancer. 2014; 135: 1110-1118Crossref PubMed Scopus (156) Google Scholar We examined the data for evidence of tumor-specific hypermethylation of the promoter region of MLH1 and MSH2 genes. The methylation array data have been deposited into the International Cancer Genome Consortium data portal (dcc.icgc.org, project PACA-AU). Download .xlsx (.08 MB) Help with xlsx files Supplementary Tables 1 and 2
DOI: 10.1038/s41467-018-08205-7
2019
Cited 166 times
Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity
Integrative analysis of multi-omics layers at single cell level is critical for accurate dissection of cell-to-cell variation within certain cell populations. Here we report scCAT-seq, a technique for simultaneously assaying chromatin accessibility and the transcriptome within the same single cell. We show that the combined single cell signatures enable accurate construction of regulatory relationships between cis-regulatory elements and the target genes at single-cell resolution, providing a new dimension of features that helps direct discovery of regulatory patterns specific to distinct cell identities. Moreover, we generate the first single cell integrated map of chromatin accessibility and transcriptome in early embryos and demonstrate the robustness of scCAT-seq in the precise dissection of master transcription factors in cells of distinct states. The ability to obtain these two layers of omics data will help provide more accurate definitions of "single cell state" and enable the deconvolution of regulatory heterogeneity from complex cell populations.
DOI: 10.1038/nmeth.2562
2013
Cited 154 times
Computational approaches to identify functional genetic variants in cancer genomes
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor but only a minority of these drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
DOI: 10.1002/path.4583
2015
Cited 97 times
Integrated genomic and transcriptomic analysis of human brain metastases identifies alterations of potential clinical significance
Abstract Treatment options for patients with brain metastases ( BMs ) have limited efficacy and the mortality rate is virtually 100%. Targeted therapy is critically under‐utilized, and our understanding of mechanisms underpinning metastatic outgrowth in the brain is limited. To address these deficiencies, we investigated the genomic and transcriptomic landscapes of 36 BMs from breast, lung, melanoma and oesophageal cancers, using DNA copy‐number analysis and exome‐ and RNA ‐sequencing. The key findings were as follows. (a) Identification of novel candidates with possible roles in BM development, including the significantly mutated genes DSC2 , ST7 , PIK3R1 and SMC5 , and the DNA repair, ERBB – HER signalling, axon guidance and protein kinase‐A signalling pathways. (b) Mutational signature analysis was applied to successfully identify the primary cancer type for two BMs with unknown origins. (c) Actionable genomic alterations were identified in 31/36 BMs (86%); in one case we retrospectively identified ERBB2 amplification representing apparent HER2 status conversion, then confirmed progressive enrichment for HER2 ‐positivity across four consecutive metastatic deposits by IHC and SISH , resulting in the deployment of HER2 ‐targeted therapy for the patient. (d) In the ERBB / HER pathway, ERBB2 expression correlated with ERBB3 ( r 2 = 0.496; p &lt; 0.0001) and HER3 and HER4 were frequently activated in an independent cohort of 167 archival BM from seven primary cancer types: 57.6% and 52.6% of cases were phospho‐ HER3 Y1222 or phospho‐ HER4 Y1162 membrane‐positive, respectively. The HER3 ligands NRG1 / 2 were barely detectable by RNAseq , with NRG1 (8p12) genomic loss in 63.6% breast cancer‐ BMs , suggesting a microenvironmental source of ligand. In summary, this is the first study to characterize the genomic landscapes of BM . The data revealed novel candidates, potential clinical applications for genomic profiling of resectable BMs , and highlighted the possibility of therapeutically targeting HER3 , which is broadly over‐expressed and activated in BMs , independent of primary site and systemic therapy. Copyright © 2015 Pathological Society of Great Britain and Ireland. Published by John Wiley &amp; Sons, Ltd.
DOI: 10.1371/journal.pone.0045835
2012
Cited 93 times
qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single-Nucleotide Polymorphism Profiles
Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 ([Formula: see text]-value=0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 ([Formula: see text]-value [Formula: see text] 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 ([Formula: see text]-value=0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/.
DOI: 10.1038/s41375-018-0206-x
2018
Cited 92 times
Subclonal evolution in disease progression from MGUS/SMM to multiple myeloma is characterised by clonal stability
Multiple myeloma (MM) is a largely incurable haematological malignancy defined by the clonal proliferation of malignant plasma cells (PCs) within the bone marrow. Clonal heterogeneity has recently been established as a feature in MM, however, the subclonal evolution associated with disease progression has not been described. Here, we performed whole-exome sequencing of serial samples from 10 patients, providing new insights into the progression from monoclonal gammopathy of undetermined significance (MGUS) and smouldering MM (SMM), to symptomatic MM. We confirm that intraclonal genetic heterogeneity is a common feature at diagnosis and that the driving events involved in disease progression are more subtle than previously reported. We reveal that MM evolution is mainly characterised by the phenomenon of clonal stability, where the transformed subclonal PC populations identified at MM are already present in the asymptomatic MGUS/SMM stages. Our findings highlight the possibility that PC extrinsic factors may play a role in subclonal evolution and MGUS/SMM to MM progression.
DOI: 10.1200/jco.18.02365
2019
Cited 84 times
Progression of Disease Within 24 Months in Follicular Lymphoma Is Associated With Reduced Intratumoral Immune Infiltration
Understanding the immunobiology of the 15% to 30% of patients with follicular lymphoma (FL) who experience progression of disease within 24 months (POD24) remains a priority. Solid tumors with low levels of intratumoral immune infiltration have inferior outcomes. It is unknown whether a similar relationship exists between POD24 in FL.Digital gene expression using a custom code set-five immune effector, six immune checkpoint, one macrophage molecules-was applied to a discovery cohort of patients with early- and advanced-stage FL (n = 132). T-cell receptor repertoire analysis, flow cytometry, multispectral immunofluorescence, and next-generation sequencing were performed. The immune infiltration profile was validated in two independent cohorts of patients with advanced-stage FL requiring systemic treatment (n = 138, rituximab plus cyclophosphamide, vincristine, prednisone; n = 45, rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone), with the latter selected to permit comparison of patients experiencing a POD24 event with those having no progression at 5 years or more.Immune molecules showed distinct clustering, characterized by either high or low expression regardless of categorization as an immune effector, immune checkpoint, or macrophage molecule. Low programmed death-ligand 2 (PD-L2) was the most sensitive/specific marker to segregate patients with adverse outcomes; therefore, PD-L2 expression was chosen to distinguish immune infiltrationHI (ie, high PD-L2) FL biopsies from immune infiltrationLO (ie, low PD-L2) tumors. Immune infiltrationHI tissues were highly infiltrated with macrophages and expanded populations of T-cell clones. Of note, the immune infiltrationLO subset of patients with FL was enriched for POD24 events (odds ratio [OR], 4.32; c-statistic, 0.81; P = .001), validated in the independent cohorts (rituximab plus cyclophosphamide, vincristine, prednisone: OR, 2.95; c-statistic, 0.75; P = .011; and rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone: OR, 7.09; c-statistic, 0.88; P = .011). Mutations were equally proportioned across tissues, which indicated that degree of immune infiltration is capturing aspects of FL biology distinct from its mutational profile.Assessment of immune-infiltration by PD-L2 expression is a promising tool with which to help identify patients who are at risk for POD24.
DOI: 10.1038/s41467-020-14286-0
2020
Cited 68 times
Chromosome arm aneuploidies shape tumour evolution and drug response
Abstract Chromosome arm aneuploidies (CAAs) are pervasive in cancers. However, how they affect cancer development, prognosis and treatment remains largely unknown. Here, we analyse CAA profiles of 23,427 tumours, identifying aspects of tumour evolution including probable orders in which CAAs occur and CAAs predicting tissue-specific metastasis. Both haematological and solid cancers initially gain chromosome arms, while only solid cancers subsequently preferentially lose multiple arms. 72 CAAs and 88 synergistically co-occurring CAA pairs multivariately predict good or poor survival for 58% of 6977 patients, with negligible impact of whole-genome doubling. Additionally, machine learning identifies 31 CAAs that robustly alter response to 56 chemotherapeutic drugs across cell lines representing 17 cancer types. We also uncover 1024 potential synthetic lethal pharmacogenomic interactions. Notably, in predicting drug response, CAAs substantially outperform mutations and focal deletions/amplifications combined. Thus, CAAs predict cancer prognosis, shape tumour evolution, metastasis and drug response, and may advance precision oncology.
DOI: 10.1182/blood.2020008520
2021
Cited 62 times
EBV-associated primary CNS lymphoma occurring after immunosuppression is a distinct immunobiological entity
Abstract Primary central nervous system lymphoma (PCNSL) is confined to the brain, eyes, and cerebrospinal fluid without evidence of systemic spread. Rarely, PCNSL occurs in the context of immunosuppression (eg, posttransplant lymphoproliferative disorders or HIV [AIDS-related PCNSL]). These cases are poorly characterized, have dismal outcome, and are typically Epstein-Barr virus (EBV)-associated (ie, tissue-positive). We used targeted sequencing and digital multiplex gene expression to compare the genetic landscape and tumor microenvironment (TME) of 91 PCNSL tissues all with diffuse large B-cell lymphoma histology. Forty-seven were EBV tissue-negative: 45 EBV− HIV− PCNSL and 2 EBV− HIV+ PCNSL; and 44 were EBV tissue-positive: 23 EBV+ HIV+ PCNSL and 21 EBV+ HIV− PCNSL. As with prior studies, EBV− HIV− PCNSL had frequent MYD88, CD79B, and PIM1 mutations, and enrichment for the activated B-cell (ABC) cell-of-origin subtype. In contrast, these mutations were absent in all EBV tissue-positive cases and ABC frequency was low. Furthermore, copy number loss in HLA class I/II and antigen-presenting/processing genes were rarely observed, indicating retained antigen presentation. To counter this, EBV+ HIV− PCNSL had a tolerogenic TME with elevated macrophage and immune-checkpoint gene expression, whereas AIDS-related PCNSL had low CD4 gene counts. EBV-associated PCNSL in the immunosuppressed is immunobiologically distinct from EBV− HIV− PCNSL, and, despite expressing an immunogenic virus, retains the ability to present EBV antigens. Results provide a framework for targeted treatment.
DOI: 10.1093/nar/gkm950
2007
Cited 115 times
LOCATE: a mammalian protein subcellular localization database
LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at http://locate.imb.uq.edu.au/
DOI: 10.1371/journal.pone.0074380
2013
Cited 67 times
Somatic Point Mutation Calling in Low Cellularity Tumors
Somatic mutation calling from next-generation sequencing data remains a challenge due to the difficulties of distinguishing true somatic events from artifacts arising from PCR, sequencing errors or mis-mapping. Tumor cellularity or purity, sub-clonality and copy number changes also confound the identification of true somatic events against a background of germline variants. We have developed a heuristic strategy and software (http://www.qcmg.org/bioinformatics/qsnp/) for somatic mutation calling in samples with low tumor content and we show the superior sensitivity and precision of our approach using a previously sequenced cell line, a series of tumor/normal admixtures, and 3,253 putative somatic SNVs verified on an orthogonal platform.
DOI: 10.1093/nargab/lqaa034
2020
Cited 41 times
Comparative performance of the BGI and Illumina sequencing technology for single-cell RNA-sequencing
Abstract The libraries generated by high-throughput single cell RNA-sequencing (scRNA-seq) platforms such as the Chromium from 10× Genomics require considerable amounts of sequencing, typically due to the large number of cells. The ability to use these data to address biological questions is directly impacted by the quality of the sequence data. Here we have compared the performance of the Illumina NextSeq 500 and NovaSeq 6000 against the BGI MGISEQ-2000 platform using identical Single Cell 3′ libraries consisting of over 70 000 cells generated on the 10× Genomics Chromium platform. Our results demonstrate a highly comparable performance between the NovaSeq 6000 and MGISEQ-2000 in sequencing quality, and the detection of genes, cell barcodes, Unique Molecular Identifiers. The performance of the NextSeq 500 was also similarly comparable to the MGISEQ-2000 based on the same metrics. Data generated by both sequencing platforms yielded similar analytical outcomes for general single-cell analysis. The performance of the NextSeq 500 and MGISEQ-2000 were also comparable for the deconvolution of multiplexed cell pools via variant calling, and detection of guide RNA (gRNA) from a pooled CRISPR single-cell screen. Our study provides a benchmark for high-capacity sequencing platforms applied to high-throughput scRNA-seq libraries.
DOI: 10.1186/1471-2105-7-s5-s3
2006
Cited 57 times
Evaluation and comparison of mammalian subcellular localization prediction methods
Abstract Background Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance. Results In order to perform a comprehensive survey of prediction methods, we selected only methods that accepted large batches of protein sequences, were publicly available, and were able to predict localization to at least nine of the major subcellular locations ( nucleus, cytosol, mitochondrion, extracellular region, plasma membrane, Golgi apparatus, endoplasmic reticulum (ER), peroxisome , and lysosome ). The selected methods were CELLO, MultiLoc, Proteome Analyst, pTarget and WoLF PSORT. These methods were evaluated using 3763 mouse proteins from SwissProt that represent the source of the training sets used in development of the individual methods. In addition, an independent evaluation set of 2145 mouse proteins from LOCATE with a bias towards the subcellular localization underrepresented in SwissProt was used. The sensitivity and specificity were calculated for each method and compared to a theoretical value based on what might be observed by random chance. Conclusion No individual method had a sufficient level of sensitivity across both evaluation sets that would enable reliable application to hypothetical proteins. All methods showed lower performance on the LOCATE dataset and variable performance on individual subcellular localizations was observed. Proteins localized to the secretory pathway were the most difficult to predict, while nuclear and extracellular proteins were predicted with the highest sensitivity.
DOI: 10.1158/1541-7786.mcr-17-0569
2018
Cited 27 times
Targeted Next-Generation Sequencing for Detecting <i>MLL</i> Gene Fusions in Leukemia
Abstract Mixed lineage leukemia (MLL) gene rearrangements characterize approximately 70% of infant and 10% of adult and therapy-related leukemia. Conventional clinical diagnostics, including cytogenetics and fluorescence in situ hybridization (FISH) fail to detect MLL translocation partner genes (TPG) in many patients. Long-distance inverse (LDI)-PCR, the “gold standard” technique that is used to characterize MLL breakpoints, is laborious and requires a large input of genomic DNA (gDNA). To overcome the limitations of current techniques, a targeted next-generation sequencing (NGS) approach that requires low RNA input was tested. Anchored multiplex PCR-based enrichment (AMP-E) was used to rapidly identify a broad range of MLL fusions in patient specimens. Libraries generated using Archer FusionPlex Heme and Myeloid panels were sequenced using the Illumina platform. Diagnostic specimens (n = 39) from pediatric leukemia patients were tested with AMP-E and validated by LDI-PCR. In concordance with LDI-PCR, the AMP-E method successfully identified TPGs without prior knowledge. AMP-E identified 10 different MLL fusions in the 39 samples. Only two specimens were discordant; AMP-E successfully identified a MLL-MLLT1 fusion where LDI-PCR had failed to determine the breakpoint, whereas a MLL-MLLT3 fusion was not detected by AMP-E due to low expression of the fusion transcript. Sensitivity assays demonstrated that AMP-E can detect MLL-AFF1 in MV4-11 cell dilutions of 10−7 and transcripts down to 0.005 copies/ng. Implications: This study demonstrates a NGS methodology with improved sensitivity compared with current diagnostic methods for MLL-rearranged leukemia. Furthermore, this assay rapidly and reliably identifies MLL partner genes and patient-specific fusion sequences that could be used for monitoring minimal residual disease. Mol Cancer Res; 16(2); 279–85. ©2017 AACR.
DOI: 10.1371/journal.pgen.0020046
2006
Cited 40 times
Differential Use of Signal Peptides and Membrane Domains Is a Common Occurrence in the Protein Output of Transcriptional Units
Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi-spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units (TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our high-confidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome.
DOI: 10.1093/nar/gkn317
2008
Cited 32 times
BioLit: integrating biological literature with databases
BioLit is a web server which provides metadata describing the semantic content of all open access, peer-reviewed articles which describe research from the major life sciences literature archive, PubMed Central. Specifically, these metadata include database identifiers and ontology terms found within the full text of the article. BioLit delivers these metadata in the form of XML-based article files and as a custom web-based article viewer that provides context-specific functionality to the metadata. This resource aims to integrate the traditional scientific publication directly into existing biological databases, thus obviating the need for a user to search in multiple locations for information relating to a specific item of interest, for example published experimental results associated with a particular biological database entry. As an example of a possible use of BioLit, we also present an instance of the Protein Data Bank fully integrated with BioLit data. We expect that the community of life scientists in general will be the primary end-users of the web-based viewer, while biocurators will make use of the metadata-containing XML files and the BioLit database of article data. BioLit is available at http://biolit.ucsd.edu .
DOI: 10.1371/journal.pcbi.1000136
2008
Cited 30 times
Computational Biology Resources Lack Persistence and Usability
Innovation in computational biology research is predicated on the availability of published methods and computational resources. These resources facilitate the generation of new hypotheses and observations both on the part of the creators and the scientists who use them. These methods and resources include Web servers, databases, and software, both complex and simple, that implement a specific procedure or algorithm. Usually, a resource is maintained by the laboratory in which it was initially developed. We would assert that there is a growing level of frustration among scientists who attempt to use many of these resources and find that they no longer exist or are not properly maintained.
DOI: 10.1186/gb-2008-9-1-r15
2008
Cited 30 times
Towards defining the nuclear proteome
The nucleus is a complex cellular organelle and accurately defining its protein content is essential before any systematic characterization can be considered.We report direct evidence for 2,568 mammalian proteins within the nuclear proteome: the nuclear subcellular localization of 1,529 proteins based on a high-throughput subcellular localization protocol of full-length proteins and an additional 1,039 proteins for which clear experimental evidence is documented in published literature. This is direct evidence that the nuclear proteome consists of at least 14% of the entire proteome. This dataset was used to evaluate computational approaches designed to identify additional nuclear proteins.This represents direct experimental evidence that the nuclear proteome consists of at least 14% of the entire proteome. This high-quality nuclear proteome dataset was used to evaluate computational approaches designed to identify additional nuclear proteins. Based on this analysis, researchers can determine the stringency and types of lines of evidence they consider to infer the size and complement of the nuclear proteome.
DOI: 10.1093/gigascience/giy117
2018
Cited 18 times
Single-cell RNA-seq reveals dynamic transcriptome profiling in human early neural differentiation
Investigating cell fate decision and subpopulation specification in the context of the neural lineage is fundamental to understanding neurogenesis and neurodegenerative diseases. The differentiation process of neural-tube-like rosettes in vitro is representative of neural tube structures, which are composed of radially organized, columnar epithelial cells and give rise to functional neural cells. However, the underlying regulatory network of cell fate commitment during early neural differentiation remains elusive. In this study, we investigated the genome-wide transcriptome profile of single cells from six consecutive reprogramming and neural differentiation time points and identified cellular subpopulations present at each differentiation stage. Based on the inferred reconstructed trajectory and the characteristics of subpopulations contributing the most toward commitment to the central nervous system lineage at each stage during differentiation, we identified putative novel transcription factors in regulating neural differentiation. In addition, we dissected the dynamics of chromatin accessibility at the neural differentiation stages and revealed active cis-regulatory elements for transcription factors known to have a key role in neural differentiation as well as for those that we suggest are also involved. Further, communication network analysis demonstrated that cellular interactions most frequently occurred in the embryoid body stage and that each cell subpopulation possessed a distinctive spectrum of ligands and receptors associated with neural differentiation that could reflect the identity of each subpopulation. Our study provides a comprehensive and integrative study of the transcriptomics and epigenetics of human early neural differentiation, which paves the way for a deeper understanding of the regulatory mechanisms driving the differentiation of the neural lineage.
DOI: 10.1111/bjh.14649
2017
Cited 18 times
Cutting edge genomics reveal new insights into tumour development, disease progression and therapeutic impacts in multiple myeloma
Multiple Myeloma (MM) is a haematological malignancy characterised by the clonal expansion of plasma cells (PCs) within the bone marrow. Despite advances in therapy, MM remains a largely incurable disease with a median survival of 6 years. In almost all cases, the development of MM is preceded by the benign PC condition Monoclonal Gammopathy of Undetermined Significance (MGUS). Recent studies show that the transformation of MGUS to MM is associated with complex genetic changes. Understanding how these changes contribute to evolution will present targets for clinical intervention. We discuss three models of MM evolution; the linear, the expansionist and the intraclonal heterogeneity models. Of particular interest is the intraclonal heterogeneity model. Here, distinct populations of MM PCs carry differing combinations of genetic mutations. Acquisition of additional mutations can contribute to subclonal lineages where "driver" mutations may influence selective pressure and dominance, and "passenger" mutations are neutral in their effects. Furthermore, studies show that clinical intervention introduces additional selective pressure on tumour cells and can influence subclone survival, leading to therapy resistance. This review discusses how Next Generation Sequencing approaches are revealing critical insights into the genetics of MM development, disease progression and treatment. MM disease progression will illuminate possible mechanisms underlying the tumour.
DOI: 10.3390/cancers10010013
2018
Cited 17 times
Translocation Breakpoints Preferentially Occur in Euchromatin and Acrocentric Chromosomes
Chromosomal translocations drive the development of many hematological and some solid cancers. Several factors have been identified to explain the non-random occurrence of translocation breakpoints in the genome. These include chromatin density, gene density and CCCTC-binding factor (CTCF)/cohesin binding site density. However, such factors are at least partially interdependent. Using 13,844 and 1563 karyotypes from human blood and solid cancers, respectively, our multiple regression analysis only identified chromatin density as the primary statistically significant predictor. Specifically, translocation breakpoints preferentially occur in open chromatin. Also, blood and solid tumors show markedly distinct translocation signatures. Strikingly, translocation breakpoints occur significantly more frequently in acrocentric chromosomes than in non-acrocentric chromosomes. Thus, translocations are probably often generated around nucleoli in the inner nucleoplasm, away from the nuclear envelope. Importantly, our findings remain true both in multivariate analyses and after removal of highly recurrent translocations. Finally, we applied pairwise probabilistic co-occurrence modeling. In addition to well-known highly prevalent translocations, such as those resulting in BCR-ABL1 (BCR-ABL) and RUNX1-RUNX1T1 (AML1-ETO) fusion genes, we identified significantly underrepresented translocations with putative fusion genes, which are probably subject to strong negative selection during tumor evolution. Taken together, our findings provide novel insights into the generation and selection of translocations during cancer development.
DOI: 10.1111/j.1600-0854.2006.00407.x
2006
Cited 22 times
Subcellular Localization of Mammalian Type II Membrane Proteins
Application of a computational membrane organization prediction pipeline, MemO, identified putative type II membrane proteins as proteins predicted to encode a single alpha-helical transmembrane domain (TMD) and no signal peptides. MemO was applied to RIKEN's mouse isoform protein set to identify 1436 non-overlapping genomic regions or transcriptional units (TUs), which encode exclusively type II membrane proteins. Proteins with overlapping predicted InterPro and TMDs were reviewed to discard false positive predictions resulting in a dataset comprised of 1831 transcripts in 1408 TUs. This dataset was used to develop a systematic protocol to document subcellular localization of type II membrane proteins. This approach combines mining of published literature to identify subcellular localization data and a high-throughput, polymerase chain reaction (PCR)-based approach to experimentally characterize subcellular localization. These approaches have provided localization data for 244 and 169 proteins. Type II membrane proteins are localized to all major organelle compartments; however, some biases were observed towards the early secretory pathway and punctate structures. Collectively, this study reports the subcellular localization of 26% of the defined dataset. All reported localization data are presented in the LOCATE database (http://www.locate.imb.uq.edu.au).
DOI: 10.1186/1471-2105-11-103
2010
Cited 16 times
Word add-in for ontology recognition: semantic enrichment of scientific literature
In the current era of scientific research, efficient communication of information is paramount. As such, the nature of scholarly and scientific communication is changing; cyberinfrastructure is now absolutely necessary and new media are allowing information and knowledge to be more interactive and immediate. One approach to making knowledge more accessible is the addition of machine-readable semantic data to scholarly articles.The Word add-in presented here will assist authors in this effort by automatically recognizing and highlighting words or phrases that are likely information-rich, allowing authors to associate semantic data with those words or phrases, and to embed that data in the document as XML. The add-in and source code are publicly available at http://www.codeplex.com/UCSDBioLit.The Word add-in for ontology term recognition makes it possible for an author to add semantic data to a document as it is being written and it encodes these data using XML tags that are effectively a standard in life sciences literature. Allowing authors to mark-up their own work will help increase the amount and quality of machine-readable literature metadata.
DOI: 10.1200/jco.2023.41.16_suppl.e15079
2023
Olaparib in HR-deficient (HRD), metastatic triple-negative breast cancer (TNBC) and relapsed ovarian cancer (ROC) without germline mutations in <i>BRCA1 </i>or <i>BRCA2</i>: Phase 2 EMBRACE trial.
e15079 Background: Homologous recombination deficiency (HRD) caused by germline BRCA1/2 mutations (gBRCAm) sensitize high grade serous ovarian cancers (HGSOC) and TNBC to PARP inhibitors (PARPi) such as olaparib. Loss of function of BRCA1 or RAD51C, due to promoter methylation or mutation of non-BRCA HR genes may also confer sensitivity to PARPi. This investigator-initiated, proof-of-concept trial evaluated olaparib in platinum(Pt)-sensitive, relapsed HGSOC and metastatic TNBC, with HRD due to mechanisms other than gBRCAm. EMBRACE is the first trial evaluating PARPi in loss of function of BRCA1/ RAD51C by promoter methylation. Methods: Single-arm, phase 2 trial of olaparib 300 mg orally twice daily in adults with metastatic TNBC or Pt-sensitive relapsed HGSOC and HRD due to mechanisms other than gBRCAm. 1 line of prior Pt-chemotherapy was allowed (HGSOC); additional lines of non-Pt therapy were allowed (TNBC). Tumor BRCA1/RAD51C promoter methylation was determined by methylation-sensitive high-resolution melting PCR and next generation sequencing (NGS, Pillar Biosciences BRCA1 &amp; RAD51C Methylation test, high methylation threshold 70%). Unmethylated cases underwent NGS HR gene mutation testing (Pillar Biosciences HRD test, paired tumor/normal). The primary outcome was objective tumor response rate (OTRR) at 6 months(m). Secondary objectives: progression-free survival (PFS), OTRR according to HR gene mutation type, safety. Tertiary: associations between biomarkers and clinical outcomes. Results: 22 participants (15 HGSOC, 7 TNBC) enrolled from 208 screened (Sep 2018 - Mar 2022). Methylation was detected in 8/15 HGSOC (all BRCA1) and 5/7 TNBC (4 BRCA1, 1 RAD51C). Both cohorts included mutations in gBRIP1 (1), gPALB2 (1), gRAD51C (3), sBRCA1 (2), with 1 COSMIC mutational signature 3, and 1 positive HRD score in HGSOC. OTRR at 6m was 40% in HGSOC (6/15: 5 partial, 1 complete), and 0% in TNBC (0/7). OTRR was 38% (3/8) in methylated tumors vs 43% (4/7) with other HRD. Stable disease at 6m was 47% (7) in HGSOC and 43% (3) in TNBC durations were 7m in HGSOC and 3m in TNBC. PFS was 53% at 6m and 25% at 12m in HGSOC, vs 17% at 6m and NE at 12m in TNBC. TNBC 2-3 prior lines of therapy. Conclusions: Olaparib demonstrated clinically relevant activity in HGSOC without gBRCAm. Objective responses were seen in HGSOC with methylated BRCA1. Olaparib had limited activity in heavily pre-treated TNBC. Further research is needed into the effects of non-Pt chemotherapy on methylation of BRCA1/RAD51C to improve selection of HGSOC and TNBC for PARPi. (ANZCTRN12617000855325) Acknowledgements: Funding from Cancer Australia and the NBCF(PS-15-048), provision of study drug plus an untied educational grant from AstraZeneca and its group of companies. EMBRACE was led by the NHMRC CTC, University of Sydney in collaboration with ANZGOG, BCT-ANZ, and GCCTI. Clinical trial information: ACTRN12617000855325 .
DOI: 10.1016/j.compbiolchem.2003.09.006
2003
Cited 21 times
Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data
DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.
DOI: 10.1186/1471-2105-7-82
2006
Cited 19 times
PhosphoregDB: The tissue and sub-cellular distribution of mammalian protein kinases and phosphatases
Protein kinases and protein phosphatases are the fundamental components of phosphorylation dependent protein regulatory systems. We have created a database for the protein kinase-like and phosphatase-like loci of mouse http://phosphoreg.imb.uq.edu.au that integrates protein sequence, interaction, classification and pathway information with the results of a systematic screen of their sub-cellular localization and tissue specific expression data mined from the GNF tissue atlas of mouse.The database lets users query where a specific kinase or phosphatase is expressed at both the tissue and sub-cellular levels. Similarly the interface allows the user to query by tissue, pathway or sub-cellular localization, to reveal which components are co-expressed or co-localized. A review of their expression reveals 30% of these components are detected in all tissues tested while 70% show some level of tissue restriction. Hierarchical clustering of the expression data reveals that expression of these genes can be used to separate the samples into tissues of related lineage, including 3 larger clusters of nervous tissue, developing embryo and cells of the immune system. By overlaying the expression, sub-cellular localization and classification data we examine correlations between class, specificity and tissue restriction and show that tyrosine kinases are more generally expressed in fewer tissues than serine/threonine kinases.Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered.
DOI: 10.1371/journal.pcbi.1000037
2008
Cited 16 times
Open Access: Taking Full Advantage of the Content
This Journal and the Public Library of Science (PLoS) at large are standard bearers of the full potential offered through open access publication, but what of you, the reader? For most of you, open access may imply free access to read the journals, but nothing more. There is a far greater potential, but, up to now, little to point to that highlights its tangible benefits. We would argue that, as yet, the full promise of open access has not been realized. There are few persistent applications that collectively use the full on-line corpus, which for the biosciences at least is maintained in PubMed Central (http://www.pubmedcentral.nih.gov/). In short, there are no “killer apps.” Since this readership, beyond any other, would seem to have the ability to change this situation at least in the biosciences, we are issuing a call to action.
DOI: 10.1016/j.jmoldx.2023.01.008
2023
Minimizing Sample Failure Rates for Challenging Clinical Tumor Samples
Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice, largely because many of these variants may be predictive biomarkers for targeted therapies. However, there can be high sample quality control (QC) failure rates for some tests that prevent the return of results. Stem-loop inhibition mediated amplification (SLIMamp) is a patented technology that has been incorporated into commercially available cancer next-generation sequencing testing kits. The claimed advantage is that these kits can interrogate challenging formalin-fixed, paraffin-embedded tissue samples with low tumor purity, poor-quality DNA, and/or low-input DNA, resulting in a high sample QC pass rate. The study aimed to substantiate that claim using Pillar Biosciences oncoReveal Solid Tumor Panel. Forty-eight samples that had failed one or more preanalytical QC sample parameters for whole-exome sequencing from the Australian Translational Genomics Centre’s ISO15189-accredited diagnostic genomics laboratory were acquired. XING Genomic Services performed an exploratory data analysis to characterize the samples and then tested the samples in their ISO15189-accredited laboratory. Clinical reports could be generated for 37 (77%) samples, of which 29 (60%) contained clinically actionable or significant variants that would not otherwise have been identified. Eleven samples were deemed unreportable, and the sequencing data were likely dominated by artifacts. A novel postsequencing QC metric was developed that can discriminate between clinically reportable and unreportable samples. Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice, largely because many of these variants may be predictive biomarkers for targeted therapies. However, there can be high sample quality control (QC) failure rates for some tests that prevent the return of results. Stem-loop inhibition mediated amplification (SLIMamp) is a patented technology that has been incorporated into commercially available cancer next-generation sequencing testing kits. The claimed advantage is that these kits can interrogate challenging formalin-fixed, paraffin-embedded tissue samples with low tumor purity, poor-quality DNA, and/or low-input DNA, resulting in a high sample QC pass rate. The study aimed to substantiate that claim using Pillar Biosciences oncoReveal Solid Tumor Panel. Forty-eight samples that had failed one or more preanalytical QC sample parameters for whole-exome sequencing from the Australian Translational Genomics Centre’s ISO15189-accredited diagnostic genomics laboratory were acquired. XING Genomic Services performed an exploratory data analysis to characterize the samples and then tested the samples in their ISO15189-accredited laboratory. Clinical reports could be generated for 37 (77%) samples, of which 29 (60%) contained clinically actionable or significant variants that would not otherwise have been identified. Eleven samples were deemed unreportable, and the sequencing data were likely dominated by artifacts. A novel postsequencing QC metric was developed that can discriminate between clinically reportable and unreportable samples. Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice because many of these variants may be predictive biomarkers for targeted therapies or have diagnostic or prognostic relevance. However, there can be high sample quality control (QC) failure rates (up to approximately 45% for some tests) preventing the return of results, which may affect patient treatment decisions.1Mathieson W. Thomas G.A. Why formalin-fixed, paraffin-embedded biospecimens must be used in genomic medicine: an evidence-based review and conclusion.J Histochem Cytochem. 2020; 68: 543-552Crossref PubMed Scopus (18) Google Scholar, 2Robbe P. Popitsch N. Knight S.J.L. Antoniou P. Becq J. He M. Kanapin A. Samsonova A. Vavoulis D.V. Ross M.T. Kingsbury Z. Cabes M. Ramos S.D.C. Page S. Dreau H. Ridout K. Jones L.J. Tuff-Lacey A. Henderson S. Mason J. Buffa F.M. Verrill C. Maldonado-Perez D. Roxanis I. Collantes E. Browning L. Dhar S. Damato S. Davies S. Caulfield M. Bentley D.R. Taylor J.C. Turnbull C. Schuh A. 100,000 Genomes ProjectClinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project.Genet Med. 2018; 20: 1196-1205Abstract Full Text Full Text PDF PubMed Scopus (96) Google Scholar, 3Hedegaard J. Thorsen K. Lund M.K. Hein A.-M.K. Hamilton-Dutoit S.J. Vang S. Nordentoft I. Birkenkamp-Demtröder K. Kruhøffer M. Hager H. Knudsen B. Andersen C.L. Sørensen K.D. Pedersen J.S. Ørntoft T.F. Dyrskjøt L. Next-generation sequencing of RNA and DNA isolated from paired fresh-frozen and formalin-fixed paraffin-embedded samples of human cancer and normal tissue.PLoS One. 2014; 9: e98187Crossref PubMed Scopus (257) Google Scholar, 4Hussain M. Corcoran C. Sibilla C. Fizazi K. Saad F. Shore N. Sandhu S. Mateo J. Olmos D. Mehra N. Kolinsky M.P. Roubaud G. Özgüroǧlu M. Matsubara N. Gedye C. Choi Y.D. Padua C. Kohlmann A. Huisden R. Elvin J.A. Kang J. Adelman C.A. Allen A. Poehlein C. Bono J. Tumor genomic testing for >4,000 men with metastatic castration-resistant prostate cancer in the Phase III trial PROfound (Olaparib)..Clin Cancer Res. 2022; 28: 1518-1530Crossref PubMed Scopus (26) Google Scholar, 5Al-Kateb H. Nguyen T.T. Steger-May K. Pfeifer J.D. Identification of major factors associated with failed clinical molecular oncology testing performed by next generation sequencing (NGS).Mol Oncol. 2015; 9: 1737-1743Crossref PubMed Scopus (43) Google Scholar, 6Goswami R.S. Luthra R. Singh R.R. Patel K.P. Routbort M.J. Aldape K.D. Yao H. Dang H.D. Barkoh B.A. Manekia J. Medeiros L.J. Roy-Chowdhuri S. Stewart J. Broaddus R.R. Chen H. Identification of factors affecting the success of next-generation sequencing testing in solid tumors.Am J Clin Pathol. 2016; 145: 222-237Crossref PubMed Google Scholar, 7Lee C. Bae J.S. Ryu G.H. Kim N.K.D. Park D. Chung J. Kyung S. Joung J.-G. Shin H.-T. Shin S.-H. Kim Y. Kim B.S. Lee H. Kim K.-M. Kim J.-S. Park W.-Y. Son D.-S. A method to evaluate the quality of clinical gene-panel sequencing data for single-nucleotide variant detection.J Mol Diagn. 2017; 19: 651-658Abstract Full Text Full Text PDF PubMed Scopus (16) Google Scholar, 8Hiemenz M.C. Graf R.P. Schiavone K. Harries L. Oxnard G.R. Ross J.S. Huang R.S.P. Real-world comprehensive genomic profiling success rates in tissue and liquid prostate carcinoma specimens.Oncologist. 2022; 27: e970-e972Crossref PubMed Scopus (1) Google Scholar Clinical samples, especially in the case of solid tumor testing, are usually formalin-fixed, paraffin-embedded tissue (FFPET) sections, and these are known to be challenging due to the damaging effect of formalin on nucleic acids, small biopsy specimen size, and/or low tumor cell content in the tested specimen.9Blow N. Tissue preparation: tissue issues.Nature. 2007; 448: 959-963Crossref PubMed Scopus (86) Google Scholar, 10Srinivasan M. Sedmak D. Jewell S. Effect of fixatives and tissue processing on the content and integrity of nucleic acids.Am J Pathol. 2002; 161: 1961-1971Abstract Full Text Full Text PDF PubMed Scopus (985) Google Scholar, 11Williams C. Pontén F. Moberg C. Söderkvist P. Uhlén M. Pontén J. Sitbon G. Lundeberg J. A high frequency of sequence alterations is due to formalin fixation of archival specimens.Am J Pathol. 1999; 155: 1467-1471Abstract Full Text Full Text PDF PubMed Scopus (427) Google Scholar, 12Tomlins S.A. Hovelson D.H. Suga J.M. Anderson D.M. Koh H.A. Dees E.C. et al.Real-world performance of a comprehensive genomic profiling test optimized for small tumor samples.JCO Precis Oncol. 2021; 5: 1312-1324Crossref Google Scholar Although hybrid-capture–based next-generation sequencing (NGS) tests are valued as clinical tests because of their sensitivity and uniformity of coverage of targeted genomic regions, they require large amounts of high-quality DNA (usually ≥50 ng) as input to achieve a successful test result. These requirements result in higher sample QC failure rates for hybrid-capture methods.13Alborelli I. Jermann P.M. Preanalytical variables and sample quality control for clinical variant analysis.Methods Mol Biol. 2022; 2493: 331-351Crossref PubMed Scopus (1) Google Scholar, 14Bewicke-Copley F. Kumar E.A. Palladino G. Korfi K. Wang J. Applications and analysis of targeted genomic sequencing in cancer studies.Comput Struct Biotechnol J. 2019; 17: 1348-1359Abstract Full Text Full Text PDF PubMed Scopus (59) Google Scholar, 15Ionescu D.N. Stockley T.L. Banerji S. Couture C. Mather C.A. Xu Z. Blais N. Cheema P.K. Chu Q.S.-C. Melosky B. Leighl N.B. Consensus recommendations to optimize testing for new targetable alterations in non-small cell lung cancer.Curr Oncol. 2022; 29: 4981-4997Crossref PubMed Scopus (8) Google Scholar, 16Jennings L.J. Arcila M.E. Corless C. Kamel-Reid S. Lubin I.M. Pfeifer J. Temple-Smolkin R.L. Voelkerding K.V. Nikiforova M.N. Guidelines for validation of next-generation sequencing-based oncology panels: a joint consensus recommendation of the Association for Molecular Pathology and College of American Pathologists.J Mol Diagn. 2017; 19: 341-365Abstract Full Text Full Text PDF PubMed Scopus (420) Google Scholar, 17Samorodnitsky E. Jewell B.M. Hagopian R. Miya J. Wing M.R. Lyon E. Damodaran S. Bhatt D. Reeser J.W. Datta J. Roychowdhury S. Evaluation of hybridization capture versus amplicon-based methods for whole-exome sequencing.Hum Mutat. 2015; 36: 903-914Crossref PubMed Scopus (164) Google Scholar Amplicon-based NGS tests are generally more successful at testing challenging clinical samples than hybrid-capture methods because of lower DNA input requirements. However, these tests are still vulnerable to poor-quality DNA. Pillar Biosciences (Natick, MA) has patented stem-loop inhibition mediated amplification (SLIMamp) technology and incorporated this into commercially available amplicon-based NGS cancer testing kits specifically to overcome the input DNA challenges. The claim is that these kits can successfully interrogate FFPET samples with poor-quality DNA and/or low-input DNA amounts, resulting in a higher sample QC pass rate than either hybrid-capture methods or conventional amplicon-based sequencing methods, as discussed previously.18Kalinava N. Apfel A. Cartmell R. Srinivasan S. Chien M.-S. Kim K.I. Golhar R. Bednarz K.E. Pant S. Szustakowski J. Chasalow S.D. Sasson A. Kirov S. Modeling performance of sample collection sites using whole exome sequencing metrics.Biotechniques. 2020; 69: 420-426Crossref PubMed Scopus (1) Google Scholar, 19Aggarwal C. Thompson J.C. Black T.A. Katz S.I. Fan R. Yee S.S. Chien A.L. Evans T.L. Bauml J.M. Alley E.W. Ciunci C.A. Berman A.T. Cohen R.B. Lieberman D.B. Majmundar K.S. Savitch S.L. Morrissette J.J.D. Hwang W.-T. Elenitoba-Johnson K.S.J. Langer C.J. Carpenter E.L. Clinical implications of plasma-based genotyping with the delivery of personalized therapy in metastatic non-small cell lung cancer.JAMA Oncol. 2019; 5: 173-180Crossref PubMed Scopus (276) Google Scholar, 20Hussain M. Mateo J. Fizazi K. Saad F. Shore N. Sandhu S. et al.PROfound Trial InvestigatorsSurvival with olaparib in metastatic castration-resistant prostate cancer.N Engl J Med. 2020; 383: 2345-2357Crossref PubMed Scopus (340) Google Scholar, 21Gilson C. Ingleby F. Gilbert D.C. Parry M. Atako N.B. Mason M.D. Malik Z. Langley R.E. Simmons A. Loehr A. Clarke N. James N. Parmar M.K.B. Sydes M.R. Attard G. Chowdhury S. Targeted next-generation sequencing (tNGS) of metastatic castrate-sensitive prostate cancer (M1 CSPC): a pilot molecular analysis in the STAMPEDE multi-center clinical trial.J Clin Oncol. 2019; 37: 5019Crossref Google Scholar In addition, SLIMamp enables the enrichment of target amplicons tiled across long genomic regions, not just hotspots, to allow sequencing of multiple entire gene coding regions in an automatable, highly multiplexed single reaction tube.22Schenk D. Song G. Ke Y. Wang Z. Amplification of overlapping DNA amplicons in a single-tube multiplex PCR for targeted next-generation sequencing of BRCA1 and BRCA2.PLoS One. 2017; 12: e0181062Crossref PubMed Scopus (17) Google Scholar This approach provides a promising alternative to hybrid-capture methods due to the ability to interrogate entire genes from difficult samples, a feature that conventional amplicon struggle to offer without amplicon dropout. This was originally shown by using 5 to 100 ng of input DNA from both clinical samples and reference standards for the coding sequences of BRCA1 and BRCA2; however, an exploration of the effectiveness of SLIMamp tests on truly challenging clinical samples has not yet been published. The aim of the current study was to verify the claim of Pillar Biosciences that SLIMamp technology can successfully test challenging samples using their amplicon-based NGS oncoReveal Solid Tumor Panel (STP) test. Forty-eight samples that had failed one or more preanalytical QC sample metrics for comprehensive genome profiling (CGP) by either whole-exome or Illumina TSO500 panel sequencing from the Australian Translational Genomics Centre (ATGC), an ISO15189-accredited diagnostic genomics laboratory, were identified and provided to XING Genomic Services (XGS; Sinnamon Park, Queensland, Australia). CGP testing of these patient samples had been requested by treating clinicians but was not performed due to poor quality of the samples. XGS performed an exploratory data analysis using preanalytical QC metrics specific to the STP test to further characterize the quality of samples and then tested all samples in their ISO15189-accredited laboratory using the STP test that had been previously analytically and clinically validated. Forty-eight extracted DNA samples were provided by ATGC for use in this study. Table 1 details the study population characteristics. Whole-exome sequencing testing had been requested for 44 samples, and TSO500 DNA testing had been requested for four samples. Forty-seven samples were derived from FFPET, and one sample was derived from blood. Samples were considered by ATGC to have failed CGP preanalytical QC metrics if they yielded <260 ng DNA and/or quality below DNA Integrity Number (DIN) 3.6 and average fragment size <3600 bp (as determined by using the TapeStation genomic DNA assay; Agilent Technologies, Mulgrave, VIC, Australia). Samples were only considered for inclusion in this study if the patient had previously consented to involvement in research studies.Table 1ATGC Study Population CharacteristicsCharacteristicsSamples (N = 48)n%Sex Male3267 Female1633Diagnosis Colorectal cancer715 Lung cancer715 Prostate cancer613 Gastric cancer36 Neuroendocrine carcinoma36 Squamous cell carcinoma36 Breast cancer24 Leiomyosarcoma24 Osteosarcoma24 Pancreatic cancer24 Thyroid cancer24 Adenoid cystic carcinoma (salivary)12 Adrenal cortical carcinoma12 Angiosarcoma12 Leukemia12 Liposarcoma12 Neuroblastoma12 Parathyroid carcinoma12 Perineuroma12 Renal cancer12Diagnoses fit for the Solid Tumor Panel test are shown in bold. Even though all ATGC samples were tested with this panel, it is only designed to be relevant to a subset of samples in this study. Open table in a new tab Diagnoses fit for the Solid Tumor Panel test are shown in bold. Even though all ATGC samples were tested with this panel, it is only designed to be relevant to a subset of samples in this study. ATGC solid tumor samples were macrodissected from slides and deparaffinized by using xylene and ethanol. DNA lysate from FFPET was extracted, with an added uracil-N-glycosylase step from the GeneRead DNA FFPE Kit (catalog number 180134; Qiagen, Clayton, VIC, Australia) to reduce potential cytosine deamination artifacts, using an automated QIAsymphony DSP DNA Mini Kit (catalog number 937236; Qiagen) according to the TLC200 protocol. DNA was extracted from the blood sample on the automated QIAsymphony DSP DNA Mini Kit (catalog number 937236; Qiagen) according to the B200 protocol. DNA samples were then quantified by using the Qubit 1X dsDNA HS Assay Kit (catalog number Q33231, Thermo Fisher Scientific, Scoresby, VIC, Australia). FFPET-derived DNA was also qualified on the TapeStation genomic DNA assay (catalog numbers 5067-5365 and 5067-5366; Agilent) to determine DIN and average fragment size. ATGC provided extracted DNA to XGS. If enough DNA was available for both quantification and testing, according to the concentration and amount provided by ATGC, total DNA concentration was confirmed by XGS using the Qubit dsDNA assay. The QIAseq DNA QuantiMIZE kit (catalog number 333414; Qiagen) was used to quantify and qualify amplifiable DNA. This kit uses two qPCR assays that interrogate 40 genomic loci to determine the amounts of amplifiable DNA fragments in a sample. Briefly, samples or control genomic DNA were mixed with a qPCR master mix and QuantiMIZE primer pairs. Real-time PCR was performed according to the manufacturer’s instructions, and CT values were analyzed to determine concentration and absolute quantities of amplifiable DNA. Samples were tested in triplicate. The oncoReveal Solid Tumor Panel (STP) kit (catalog number HDA-HS-1005-24; Pillar Biosciences) is a multi-gene test that targets hotspot variants considered to be driving events in solid tumors; it is recommended for use with colorectal, melanoma, thyroid, non–small-cell lung, and pancreatic cancers, as well as gastrointestinal stromal tumors and gliomas. It covers 23,895 bases across regions of interests in 47 genes, including AKT1, ALK, ARAF, BRAF, CDKN2A, CTNNB1, CYSLTR2, DDR2, EGFR, E1F1AX, ERBB2, ERBB4, FBXW7, FGFR1, FGFR2, FGFR3, GNA11, GNAQ, GNAS, H3F3A, HIST1H3B, HRAS, IDH1, IDH2, KIT, KRAS, MAP2K1, MET, NRAS, NTRK1, PDGFRA, PIK3CA, PLCB4, POLD1, POLE, PTEN, PTPN11, RAC1, RAF1, RET, SF3B1, SMAD4, SRSF2, STK11, the TERT promoter, TP53, and TSHR. The STP kit was used to prepare amplicon libraries for all 48 ATGC samples. If total library input DNA was estimated to be <20 ng, libraries were prepared directly from the material in the provided tube. In two cases, a tube appeared to contain no liquid, and thus the library was prepared from 6.25 μL of water that was used to wash the inside of the tube to solubilize any DNA that may have been present. Libraries were prepared according to the manufacturer’s instructions. Briefly, each region of interest was amplified by using gene-specific primers in the first round of PCR, after which excess primers were digested, and the PCR products were purified via size selection. Next, index adaptors were added to each library for sample tracking. This mix was further amplified (ie, indexing-PCR) and purified. The final libraries were then quantified by using the Qubit assay and normalized for sequencing. Libraries were sequenced on a NextSeq 550 (Illumina, Melbourne, VIC, Australia). Because some of these samples were sequenced in batches with routine clinical samples, a uniform level of coverage across all samples was not achieved. However, a minimum of 3500× mean coverage was required, as per test validation, for a sample to pass coverage-based QC. Short read alignment, coverage analysis, and variant calling were performed by using the PiVAT bioinformatics platform (version 2020.2.2) of Pillar Biosciences with default settings. Variants called by PiVAT were filtered to remove those with a variant allele frequency (VAF) of <3%. The validated limit of detection (LOD) of the STP test as implemented by XGS is 5% based on a minimum tumor content of 20%, although variants with VAFs between 3% and 5% may be reported if coverage depth and tumor content are sufficiently high. Variants with <200× coverage were also removed before variant annotation and interpretation. Samples were considered to pass postsequencing QC if they had a mean coverage of at least 3500× with a minimum 98% of targeted regions receiving at least 200× coverage. Samples were considered to be reportable using expert interpretation of the sequencing data in the context of associated sample QC metrics and diagnostic information. Because this study was an exploratory analysis, criteria that would discriminate between a reportable sample and an unreportable sample were not specified a priori. QuantiMIZE data previously generated by XGS from an additional 47 solid tumor specimens acquired during routine patient testing or clinical test validation were included with QuantiMIZE data from the ATGC samples to characterize the effect on amplifiable FFPET-derived DNA concentration on SLIMamp kit-based test results. A total of 46 of the 47 additional samples had been successfully tested with either the oncoReveal HRD or oncoReveal BRCA1 and RAD51C methylation, both of which are SLIMamp-based kits (catalog numbers HDA-HR-1003-96 and HDA-HR-1005-96; Pillar Biosciences), and one sample was considered to have failed testing. One sample was derived from a 43-year-old FFPE block, while the other FFPET samples were between 3 months and 11 years old, relative to the test date (Supplemental Figure S1). In addition, the Horizon Discovery Quantitative Multiplex Reference Standards for Formalin-Compromised DNA (mild, moderate, and severe) were tested with the QuantiMIZE assay to characterize the extent of formalin-related DNA degradation on amplifiable DNA concentration (catalog numbers HD798, HD799, and HD803; MetaGene, Pty. Ltd., Brisbane, QLD, Australia). Data were analyzed by using R version 4.1.2. (R Foundation for Statistical Computing, https://www.R-project.org, last accessed November 1, 2021). Single nucleotide variant base transition and transversion analysis was performed by using the maftools R package, version 2.10.0.23Mayakonda A. Lin D.-C. Assenov Y. Plass C. Koeffler H.P. Maftools: efficient and comprehensive analysis of somatic variants in cancer.Genome Res. 2018; 28: 1747-1756Crossref PubMed Scopus (1817) Google Scholar Feature selection analysis of attributes describing the samples was performed by using the Boruta R package, version 7.0.0.24Kursa M.B. Rudnicki W.R. Feature selection with the Boruta package.J Stat Softw. 2010; 36: 1-11Crossref Scopus (2229) Google Scholar ATGC preanalytical QC criteria were input DNA ≥260 ng, a DIN of at least 3.6, and average fragment size of >3600 bp. A total of 40 of the 48 samples that were provided for this study failed at least two of these criteria. Only two samples were considered to fail QC based on low-input DNA alone. Five samples had a DIN of at least 3.6 (passing QC) while the rest were qualified as having very fragmented DNA, likely due to either, or a combination of, formalin fixation and attributes of the specimen that make it challenging for clinical testing (eg, fibrous tissue, calcified tissue). One sample passed all QC criteria and testing proceeded with the TSO500 test, but the sample did not generate a library of sufficient quality for sequencing and was hence included in this study. Supplemental Table S1 presents all QC metrics, where known, for the 48 ATGC samples tested here. According to the manufacturer’s instructions, the STP test requires 10 to 75 ng of input DNA, although the range that was previously used to validate this kit for clinical use was 5 to 20 ng. In this study, 20 ng of input DNA was used for library preparation, when possible. Twenty-one of the 48 ATGC samples tested contained <10 ng input DNA; 13 samples contained <5 ng or did not have enough DNA to estimate the DNA concentration or amount (Supplemental Table S1). For the latter cases, 6.25 μL of the tube contents was used or 6.25 μL of water that had been added to the tube if it appeared empty was used. After library preparation with the STP kit, the concentration of the library was measured and was considered to pass QC with a concentration ≥2 ng/μL. Twelve samples had a library concentration <2 ng/μL. Five samples had both <5 ng input DNA and a library concentration <2 ng/μL. Because there is no stated requirement for a specific level of DNA integrity for the SLIMamp kits, a study using the QuantiMIZE assay was performed in an attempt to determine a threshold of amplifiable DNA concentration that could serve as a potential preanalytical QC metric. Only 39 of the 48 ATGC samples tested here had sufficient DNA to permit QuantiMIZE testing; these were therefore supplemented with 47 additional HRD samples from XGS’s routine testing service, as well as three Horizon Discovery formalin-compromised DNA reference standards, totaling 89 samples. Figure 1 shows the distribution of amplifiable DNA concentrations in the combined ATGC and XGS samples that were considered to be either unreportable (n = 2) or reportable (n = 87) based on the postsequencing QC metric developed as part of this study (described in Postsequencing QC Metrics). The QuantiMIZE concentrations for the Horizon Discovery standards (mild, moderate, and severe formalin-compromised DNA) are shown for reference. Notably, almost 40% of the samples tested with the QuantiMIZE assay were estimated to be poorer quality than the severe formalin-compromised reference standard. After sequencing and primary and secondary bioinformatics analysis, a number of potential QC metrics were explored to determine if sample reportability could be determined after sequencing but before issuing a clinical report. Coverage metrics of the STP target regions were considered first, including mean coverage, the percentage of targeted regions covered by at least 200 reads, and the overall percentage of on-target reads. Two samples did not meet the minimum mean coverage of 3500× (although they were both >3000×); five samples did not have at least 98% of targeted regions covered by a minimum of 200 reads; and 15 samples had an on-target rate <95% (Supplemental Table S1). The number of variants called after filtering in the ATGC samples (as described in Materials and Methods) (Figure 2) was also considered because it seemed there was a very broad range of variant counts across all samples. Ten of the 48 ATGC samples had noticeably more variants than the other samples. Samples with a relatively high number of variants were more difficult to interpret. The VAF distribution was also very different across all samples, and samples with a high number of variants frequently had a large proportion of variants with low VAFs. The VAF distribution of the ATGC samples was quantitatively characterized by determining the kernel density estimation of all VAFs in a sample. Examples of VAF density from samples with either moderate or poor DNA quality are presented in Figure 3 as light gray areas; all ATGC samples are shown in Supplemental Figure S2, including XMP-0014-22, which represents good-quality DNA.25Rosenblatt M. Remarks on some nonparametric estimates of a density function.Ann Math Stat. 1956; 27: 832-837Crossref Google Scholar,26Parzen E. On estimation of a probability density function and mode.Ann Math Stat. 1962; 33: 1065-1076Crossref Google ScholarFigure 3Variant allele frequency (VAF) density and the first derivative of the density curve [f′(VAF Density)]. Single ATGC samples representing moderate- and poor-quality DNA are shown here. The x axis is the percent VAF. The estimated VAF kernel density is shown in the light gray area with the range of y-values shown on the first y axis (left side). The f′(VAFD) is shown by the black line with the range of values on the second y axis (right side). The VAF limit of detection (5%) is shown by the vertical dotted gray line, which is also the threshold for sample reportability. The maximum value of that derivative, referred to here as max[f′(VAFD)], is shown by the vertical dash-dotted gray line. A: Moderate-quality DNA, represented by XMP-0100-21. This sample has 27 variants with 48% C>T. B: Poor-quality DNA, represented by XMP-0092-21. This sample has 142 variants with 80% C>T.View Large Image Figure ViewerDownload Hi-res image Download (PPT) When comparing the VAF densities of the reportable and unreportable groups of samples, a pattern emerged whereby the unreportable samples had a peak density at a very low VAF, and the reportable samples had densities that were more evenly spread across the entire range of VAFs. To create a single value that could capture the nature of these different VAF density patterns and potentially serve as a QC metric with a threshold value, the first derivative of the density curve (f′(VAFD)) was calculated. The maximum value of that derivative, referred to here as max(f′(VAFD)), was then identified; examples of the f′(VAFD) are shown in Figure 3 as black lines. Nine ATGC samples, all unreportable, had a max(f′(VAFD)) that was below the validated LOD for the STP test (5% VAF), which means that the vast majority (but not all) of the variants in those samples were present at VAFs below the test LOD (5%). The max(f′(VAFD)) values for the reportable ATGC samples occur at VAFs ranging from 16% to 78%; examples of the max(f′(VAFD)) are shown in Figure 3 as vertical dash-dotted gray lines. Because these samples had been preserved in formalin, and formalin fixation induces C-to-T transitions with a specific molecular signature27Alexandrov L.B. Jones P.H. Wedge D.C. Sale J.E. Campbell P.J. Nik-Zainal S. Stratton M.R. Clock-like mutational processes in human somatic cells.Nat Genet. 2015; 47: 1404-1407Crossref Scopus (600) Google Scholar (COSMIC Signature SBS30), the distribution of base changes in each sample was investigated to determine if this signature was closely associated with samples that were unreportable. Figure 4 pre
DOI: 10.1371/journal.pcbi.1000247
2008
Cited 14 times
I Am Not a Scientist, I Am a Number
We suspect many of our readers will be familiar with the cult TV show The Prisoner, in which actor Patrick McGoohan had his identity taken away by unknown assailants for unknown reasons, and his pleas of “I am not a number, I am a person” (http://www.youtube.com/watch?v=29JewlGsYxs&feature=related) were greeted with variants of “whatever you say, number six.” We would suggest that, as scientists, we are in a situation where the opposite will soon be true, at least for the purposes of scientific scholarship. Scientists will want to be assigned a number, or at least a unique identifier. Why? Imagine a time when you and your complete scholarly output—papers, grant applications, blog posts, etc.—could be identified online and in perpetuity and returned in a variety of easy-to-digest ways. While ego comes into it as a driver to make this happen, measuring scientific career advancement is something that lacks good metrics in a digital world. Unless one has a truly unique name, applying such a metric is not possible now. Even with a unique name, what is the guarantee that all of our scholarly output will be captured by one source of that information? In the end, we as individuals are the only ones who reliably track our scholarly output. This situation is beginning to change, and, as we shall see, new metrics have the promise of much more than simply returning references to our collective life's work as currently described by research papers, research proceedings, books, and book chapters. Although even a complete and current resume generated on demand would be a big step, if it could be returned in a variety of formats for a variety of purposes. These complete resumes are something many of us spend endless hours generating. The idea of having our scholarly output properly characterized is not out of reach, since the articles we write are already identified uniquely by a Digital Object Identifier (DOI; discussed further below). A book or journal is identified by an ISBN, and citations are identified by PubMed identifiers, and so on. The ideas discussed here simply take this identification process for individual publications and citations to the point of providing unique descriptors for each author and to uniquely identify all of each author's scholarly work.
DOI: 10.18632/oncotarget.27206
2019
Cited 10 times
<i>PTEN</i> deletion drives acute myeloid leukemia resistance to MEK inhibitors
Kinases such as MEK are attractive targets for novel therapy in cancer, including acute myeloid leukaemia (AML).Acquired and inherent resistance to kinase inhibitors, however, is becoming an increasingly important challenge for the clinical success of such therapeutics, and often arises from mutations in the drug-binding domain of the target kinase.To identify possible causes of resistance to MEK inhibition, we generated a model of resistance by long-term treatment of AML cells with AZD6244 (selumetinib).Remarkably, resistance to MEK inhibition was due to acquired PTEN haploinsufficiency, rather than mutation of MEK.Resistance via this mechanism was confirmed using CRISPR/Cas9 technology targeting exon 5 of PTEN.While PTEN loss has been previously implicated in resistance to a number of other therapeutic agents, this is the first time that it has been shown directly and in AML.www.oncotarget.com
DOI: 10.1186/1471-2105-11-220
2010
Cited 7 times
Integration of open access literature into the RCSB Protein Data Bank using BioLit
Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB).BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java).The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.
DOI: 10.1101/552588
2019
Cited 5 times
Comparative performance of the BGI and Illumina sequencing technology for single-cell RNA-sequencing
Abstract The libraries generated by high-throughput single cell RNA-sequencing platforms such as the Chromium from 10X Genomics require considerable amounts of sequencing, typically due to the large number of cells. The ability to use this data to address biological questions is directly impacted by the quality of the sequence data. Here we have compared the performance of the Illumina NextSeq 500 and NovaSeq 6000 against the BGI MGISEQ-2000 platform using identical Single Cell 3’ libraries consisting of over 70,000 cells. Our results demonstrate a highly comparable performance between the NovaSeq 6000 and MGISEQ-2000 in sequencing quality, and cell, UMI, and gene detection. However, compared with the NextSeq 500, the MGISEQ-2000 platform performs consistently better, identifying more cells, genes, and UMIs at equalised read depth. We were able to call an additional 1,065,659 SNPs from sequence data generated by the BGI platform, enabling an additional 14% of cells to be assigned to the correct donor from a multiplexed library. However, both the NextSeq 500 and MGISEQ-2000 detected similar frequencies of gRNAs from a pooled CRISPR single cell screen. Our study provides a benchmark for high capacity sequencing platforms applied to high-throughput single cell RNA-seq libraries.
DOI: 10.1093/bioinformatics/btg169
2003
Cited 8 times
2HAPI: a microarray data analysis system
Abstract Summary: 2HAPI (version 2 of High density Array Pattern Interpreter) is a web-based, publicly-available analytical tool designed to aid researchers in microarray data analysis. 2HAPI includes tools for searching, manipulating, visualizing, and clustering the large sets of data generated by microarray experiments. Other features include association of genes with NCBI information and linkage to external data resources. Unique to 2HAPI is the ability to retrieve upstream sequences of co-regulated genes for promoter analysis using MEME (Multiple Expectation-maximization for Motif Elicitation) Availability: 2HAPI is freely available at http://array.sdsc.edu. Users can try 2HAPI anonymously with pre-loaded data or they can register as a 2HAPI user and upload their data. Contact: gribskov@sdsc.edu * To whom correspondence should be addressed.
DOI: 10.20944/preprints201812.0156.v1
2018
Cited 4 times
The Report of Marine Life Genomic Research
With the continuing development of sequencing technology, genomics has been applied in a variety of biological research areas. In particular, the application of genomics to marine species, which boast a high diversity, promises great scientific and industrial potential. Significant progress has been made in marine genomics especially over the past few years. Consequently, BGI, leveraging its prominent contributions in genomics research, established BGI-Qingdao, an institute specifically aimed at exploring marine genomics. In order to accelerate marine genomics research and related applications, BGI-Qingdao initiated the International Conference on Genomics of the Ocean (ICG-Ocean) to develop international collaborations and establish a focused and coherent global research plan. Last year, the first ICG-Ocean conference was held in Qingdao, China, during which 47 scientists in marine genomics from all over the world reported on their research progress to an audience of about 300 attendees. This year, we would like to build on that success, drafting a report on marine genomics to draw global attention to marine genomics. We summarized the recent progress, proposed future directions, and we would like to enable additional profound insights on marine genomics. Similar to the annual report on plant and fungal research by Kew Gardens, and the White Paper of ethical issues on experimental animals, we hope our first report on marine genomics can provide some useful insights for researchers, funding agencies as well as industry, and that future versions will expand upon the foundation established here in both breadth and depth of knowledge.This report summarizes the recent progress in marine genomics in six parts including: marine microorganisms, marine fungi, marine algae and plants, marine invertebrates, marine vertebrates and genomics-based applications.
DOI: 10.1007/s10689-023-00351-2
2023
Inherited BRCA1 and RNF43 pathogenic variants in a familial colorectal cancer type X family
Genetic susceptibility to familial colorectal cancer (CRC), including for individuals classified as Familial Colorectal Cancer Type X (FCCTX), remains poorly understood. We describe a multi-generation CRC-affected family segregating pathogenic variants in both BRCA1, a gene associated with breast and ovarian cancer and RNF43, a gene associated with Serrated Polyposis Syndrome (SPS). A single family out of 105 families meeting the criteria for FCCTX (Amsterdam I family history criteria with mismatch repair (MMR)-proficient CRCs) recruited to the Australasian Colorectal Cancer Family Registry (ACCFR; 1998-2008) that underwent whole exome sequencing (WES), was selected for further testing. CRC and polyp tissue from four carriers were molecularly characterized including a single CRC that underwent WES to determine tumor mutational signatures and loss of heterozygosity (LOH) events. Ten carriers of a germline pathogenic variant BRCA1:c.2681_2682delAA p.Lys894ThrfsTer8 and eight carriers of a germline pathogenic variant RNF43:c.988 C > T p.Arg330Ter were identified in this family. Seven members carried both variants, four of which developed CRC. A single carrier of the RNF43 variant met the 2019 World Health Organization (WHO2019) criteria for SPS, developing a BRAF p.V600 wildtype CRC. Loss of the wildtype allele for both BRCA1 and RNF43 variants was observed in three CRC tumors while a LOH event across chromosome 17q encompassing both genes was observed in a CRC. Tumor mutational signature analysis identified the homologous recombination deficiency (HRD)-associated COSMIC signatures SBS3 and ID6 in a CRC for a carrier of both variants. Our findings show digenic inheritance of pathogenic variants in BRCA1 and RNF43 segregating with CRC in a FCCTX family. LOH and evidence of BRCA1-associated HRD supports the importance of both these tumor suppressor genes in CRC tumorigenesis.
DOI: 10.1002/(sici)1097-4644(19981001)71:1<1::aid-jcb1>3.0.co;2-
1998
Cited 6 times
Dual cytoplasmic and nuclear distribution of the novel arsenite-stimulated human ATPase (hASNA-I)
DOI: 10.18632/oncotarget.26390
2018
Using genomics to better define high-risk MGUS/SMM patients
DOI: 10.1182/blood.v130.suppl_1.728.728
2017
The Tumor Microenvironment Is Independently Prognostic of Conventional and Clinicogenetic Risk Models in Follicular Lymphoma
Abstract Follicular Lymphoma (FL) is the most common indolent Non-Hodgkin Lymphoma. Despite generally favorable survival outcomes, 20% of FL patients experience 'Progression of Disease within 24 months' (POD24) and subsequently have poor long-term overall survival (OS) (Casulo, JCO 2015). Unfortunately, POD24 has limited clinical value because it cannot guide up-front clinical decisions. Accurate pre-therapy prognosticators are vital for clinical trial design and are also increasingly being mandated by funding agencies for stratification of patients to emerging front-line treatments. The new 'state-of-the-art' prognosticators 'm7-FLIPI' and POD24-PI' (Pastore, Lancet Oncol 2015; Jurinovic, Blood 2016) supplement clinical parameters with genetic mutational status. However, their applicability to population based cohorts including early-stage and asymptomatic patients remains unknown. Furthermore, there is significant heterogeneity of outcome within these prognostic groupings. The established biological and prognostic importance of the tumor microenvironment (TME) in FL suggests that prognosis would be enhanced by incorporating information on host immunity (Scott, Nat Rev Can 2014). Forty-five pre-treatment FL biopsies were categorized into 'hot' or 'cold' immune nodes by multiplex immunofluorescent imaging and respectively characterized by concordant high or low expression of multiple immune effector and checkpoint-associated proteins. (Fig 1A). Consistent with these findings, gene expression using the Nanostring platform showed that immune effectors (CD4/CD8/TNFa/CD137/CD56) positively correlated with immune checkpoints (PD-1/PD-L1/PD-L2/TIM3/LAG3/CD163/CD68) indicative of an adaptive immune response. Additionally, high-throughput unbiased TCRb sequencing showed the intratumoral TCR repertoire was more clonal in 'hot' compared to 'cold' FL samples (p=0.024), indicative of a skewed T-cell immune response (Fig 1B). We then applied these findings to an independent population-based cohort of 175 cases of FL from the rituximab era with long-term follow-up (median ~7 years), including advanced (n=137) and localized cases (n=38). The aims were to: a) identify new targetable immune parameters of prognostic importance in the rituximab-era; and b) compare and contrast these with published prognostic tools: FLIPI, FLIPI-2, m7-FLIPI, POD24-PI and 'immune survival score' ('ISS', Dave, NEJM 2004). OS was not only inferior in those experiencing POD24 (HR 4.88, p&amp;lt;0.0001, Fig 1C) but these patients had a &amp;gt;2-fold increase in 5-year patient health costs. Hence, POD24, as well as FFS and TT2T were therefore chosen as the primary outcome measures. M7 mutation frequencies were similar to those previously published (Pastore, Lancet Oncol 2015). However, the prognostic utility of the m7-FLIPI could not be demonstrated, whereas the FLIPI, FLIPI-2, and POD24-PI retained their prognostic value. The POD24-PI was most predictive of FFS (p&amp;lt;0.0001, HR=3.54) and was most specific in identifying cases that experience POD24 (Sp=68%). The prognostic utility of the TME was then tested. Notably both the ISS (p=0.024, HR=1.74) and multiple immune genes not represented in the ISS including PD-L2, TIM3, LAG3, CD137, TNF and CD4 predicted FFS. PD-L2 demonstrated the strongest association with FFS (p&amp;lt;0.0001, HR=3.74, Fig 1D). It not only out-performed the ISS but was independent to the FLIPI and POD-24-PI. The prognostic significance of PD-L2 was validated in an independent population based cohort of uniformly R-CVP treated patients from an in-silico dataset with gene expression quantified using the Illumina DASL platform (Pastore, Lancet Oncol 2015). We have validated the TME in predicting outcome in a population based cohort of FL patients with long-term follow-up treated in the rituximab era. Furthermore, we describe the role of PD-L2 as well as several additional pertinent, clinically-actionable markers of the TME which predict survival to conventional therapies in FL. Low expression of PD-L2 appears to be a surrogate of a broadly co-ordinated downregulation of the intratumoural response. These immune scores are independent of and additive to additive to the FLIPI and POD24-PI. Development of new prognostic models require the incorporation of host immunity along with clinico-genetic features to further improve the specificity, and to accurately risk stratify FL patients. Disclosures No relevant conflicts of interest to declare.
DOI: 10.1182/blood.v130.suppl_1.2731.2731
2017
The impact of EBV upon the tumor microenvironment and mutational profile of primary CNS lymphoma in PTLD
Abstract Primary CNS Lymphoma (PCNSL) with diffuse large B-cell lymphoma (DLBCL) histology occurring after organ transplantation (EBV+ PCNSL-PTLD) is characterized by extremely poor outcome, almost universal EBV-positivity and late presentation relative to systemic PTLD (Fink, AJT 2012; Evens, AJT 2013). However, as the incidence is low and biopsy material limited, characterization of its immunobiology is minimal. Furthermore, there is no comparative data with EBV+ DLBCL occurring as systemic PTLD (EBV+ syDLBCL-PTLD), or with EBV-ve DLBCL in the non-PTLD setting occurring either as PCNSL (EBV-ve PCSNL) or systemically (EBV-ve syDLBCL). Here, we outline results of a detailed comparison of the genetic landscape and tumor microenvironment (TME) between PCSNL and systemic DLBCL (syDLBCL), stratified by EBV and PTLD status. A combination of targeted sequencing (involving NFk-B, immune response, cell cycle and epigenetic genes), CNV analysis, nanoString gene expression (for macrophage, immune checkpoint and effector molecules), and in-vitro assays was employed. 191 adult patients with DLBCL (13 EBV+ PCNSL-PTLD; 27 EBV-ve PCNSL; 11 EBV+ syDLBCL-PTLD; and 140 EBV-ve syDLBCL) were included, with 16 non-malignant lymph nodes as controls. EBV+ PCNSL-PTLD was typically viral latency III. In EBV-ve PCNSL, in broad agreement with previous studies (Gonzalez-Aguilar, CCR 2012; Chapuy, Blood 2015; Fukumura, Acta Neuropath 2016; Nakamura, Neuropath Appl Neurobiol 2016), common nonsynonymous mutations in known cancer drivers were PIM1 (76%), MYD88 (72%), CD79B (68%), TBL1XR1 (48%), KMT2D (44%), TOX (28%), PRDM 1 (24%), EP300, CREBBP1 and CIITA (all 20%). Interestingly, the mutation rate in EBV-ve PCNSL was considerably higher than in EBV+ PCNSL-PTLD, and was also higher than EBV-ve syDLBCL and EBV+ syDLBCL-PTLD. In line with previous reports, there was high occurrence of CN loss at the HLA-class I/II loci (Riemersma, Blood 2000) in EBV-ve PCNSL. However, this was relatively infrequent in EBV+ PCNSL-PTLD and EBV+ syDLBCL-PTLD, and intermediate in EBV-ve syDLBCL. To investigate the TME gene expression profile, a selection of immune effector and checkpoint genes were quantified in combined PCNSL cases and compared to syDLBCL i.e. irrespective of viral status. CD4 and CD8 levels were similar in PCNSL vs. sysDLBCL, whereas the NK cell marker CD56 was 9-fold higher in PCNSL (p Next, differences between the TME in EBV-ve PCNSL and EBV+ PCNSL-PTLD were examined. PD-1 levels were similar. However, consistent with our previous findings in EBV+/-ve Hodgkin Lymphoma, levels of CD163 (Jones, CCR 2012) and LAG3 (Gandhi, Blood 2006) genes were both 4-fold higher in EBV+ PCNSL-PTLD biopsies than EBV-ve PCNSL (both p≤0.005). Furthermore, both PD-L1 and PD-L2 were 7-fold and 3-fold higher in EBV+ PCNSL-PTLD than EBV-ve PCNSL respectively (both p Combined, these results indicate that ‘immune privilege’ occurring in the context of lymphoma within the CNS is more accurately described as an adaptive immune response in which the malignant B-cell actively utilizes a variety of mechanisms to evade immune surveillance (Fig 1). For EBV+ PCNSL-PTLD this involves up-regulation of PD-L1+/PD-L2+ M2 monocyte/macrophages and LAG3, whereas with EBV-ve PCNSL the emphasis is on genetically mediated immune evasion including loss of HLA-I/II loci. The findings will help guide the rational design of novel immunotherapeutic strategies for EBV+ PCNSL-PTLD. Download : Download high-res image (186KB) Download : Download full-size image Disclosures No relevant conflicts of interest to declare.
DOI: 10.1182/blood.v130.suppl_1.391.391
2017
Whole Exome Sequencing of Paired MGUS/SMM to MM Patients Reveals Novel Subclonal Tumour Evolution Models in Disease Progression of Multiple Myeloma
Abstract Introduction: Next Generation Sequencing studies in Multiple Myeloma (MM) have demonstrated that genetic heterogeneity is characteristic of MM at presentation. However, while intraclonal heterogeneity is now an established feature of MM, the subclonal tumour evolution associated with disease progression is not well understood. Here we present the first whole exome sequencing (WES) analysis of 10 paired MGUS-MM or SMM-MM patient samples, providing new insights into the genomic complexity, key molecular mechanisms and subclonal tumour evolution underlying the progression from MGUS/SMM to symptomatic MM. Methods: Fluorescence-activated cell sorting was used to purify CD138+CD38++ plasma cells and matched CD138-38- normal cells from longitudinal MGUS-MM (n = 5) and SMM-MM (n = 5) patient samples, where bone marrow samples were isolated from patients when they were first diagnosed with MGUS/SMM, and then subsequently when they developed MM. Exome libraries were generated using the Nimblegen Hyper Library Prep kit followed by the Agilent Sure SelectXT Clinical Research Exome capture kit before WES using the Illumina HiSeq4000. Bioinformatics analysis was performed using the GATK best practices MuTect2 pipeline to identify the somatic variants, custom in house package to identify copy number changes and PhyloWGS was used to integrate SNVs/CNVs to infer the subclonal evolution associated with MM progression. Results: WES was performed to a minimum depth of 140x and identified a total 4997 somatic non-synonymous single nucleotide variants (NS-SNVs) in the MGUS/SMM samples (range 230-796), with a median 456 per patient. Interestingly, in the MM samples, we identify a total 4127 somatic NS-SNVs (range 221-609), with a median 344 per patient. We observe widespread copy number variations (CNVs), with a total of 82 genes gained or lost with progression across all patients. While we observe some previously identified known “drivers” of MM, we find that the driving events involved in progression are complex and not limited to the known SNVs or CNVs. The RAS/MAPK pathway was found to be the most frequently deregulated pathway, with KRAS and NRAS mutations observed in 40% of patients at MGUS/SMM and 70% of patients at MM. These findings highlight that “driver” mutations can be attained at both the early stages of MM and be maintained with transformation, or attained only at the later symptomatic MM stage. Subclonal reconstruction of the tumour evolution process was carried out for 8 paired patients, to identify both the patterns of evolution and the key genetic changes that occur with MM progression. Our analysis revealed two models of subclonal evolution; firstly, a dominant model (3/8 patients) in which the outgrowth of subclones from MGUS/SMM to MM was observed, and secondly, a maintenance model (5/8 patients) in which subclones that were present at MGUS/SMM are retained at the MM stage. Notably, we observed a decrease in the average number of clones present in MGUS vs. SMM patients, indicative of a reduction in clonal complexity through dominant clonal outgrowth and/or extinction of indolent clones with advancement of disease. The survival of subclonal branches to MM was determined by their clonal fitness, either through their emergence with the acquisition of candidate “driver” gene mutations or outcompeting other subclones that were present. Finally, we describe potential candidate driving events of clonal progression in a range of loci, including ICAM5, DUSP27, HERPUD1, NOD2 and TOP2A. Conclusion: Our genomic analysis of longitudinal MGUS/SMM to MM samples has revealed new insights of the subclonal tumour evolution and identified candidate mutated genes associated with MM transformation. Our analysis has revealed two models of tumour evolution involved in MGUS/SMM to MM transformation; namely the dominant subclonal tumour evolution model, and the maintenance subclonal tumour evolution model. We identified the existence of multiple subclones at the MGUS/SMM stages that are associated with MM progression. Our study suggests that in both circumstances, the subclonal populations involved in MM transformation are already present at the stages of MGUS/SMM diagnosis. Defining potential candidate genes associated with MM disease progression will assist in treatment approaches to arrest MM at the asymptomatic stages. Disclosures No relevant conflicts of interest to declare.
DOI: 10.1002/(sici)1097-4644(19981001)71:1<1::aid-jcb1>3.0.co;2-#
1998
Cited 3 times
Dual cytoplasmic and nuclear distribution of the novel arsenite‐stimulated human ATPase (hASNA‐I)
The arsenite-stimulated human ATPase (hASNA-I) protein is a distinct human ATPase whose cDNA was cloned by sequence homology to the Escherichia coli ATPase arsA. Its subcellular localization in human malignant melanoma T289 cells was examined to gain insight into the role of hASNA-I in the physiology of human cells. Immunocytochemical staining using the specific anti-hASNA-I monoclonal antibody 5G8 showed a cytoplasmic, perinuclear, and nucleolar distribution. Subcellular fractionation indicated that the cytoplasmic hASNA-I was soluble and that the perinuclear distribution was due to association with the nuclear membrane rather than with the endoplasmic reticulum. Its presence in the nucleolus was confirmed by showing colocalization with an antibody of known nucleolar specificity. Further immunocytochemical analysis showed that the hASNA-I at the nuclear membrane was associated with invaginations into the nucleus in interphase cells. These results indicate that hASNA-I is a paralogue of the bacterial ArsA protein and suggest that it plays a role in the nucleocytoplasmic transport of a nucleolar component.
DOI: 10.1182/blood.v128.22.2908.2908
2016
Identification of Multiple, Patient-Specific MLL Fusion Transcript Isoforms in Childhood Leukemia Using Anchored Multiplex PCR-Based Enrichment (AMP-E)
Abstract Chromosomal translocations involving 11q23, resulting in rearrangements of the mixed lineage leukemia gene (MLL, re-named KMT2A) are frequent events in childhood leukemia. MLL is highly promiscuous, with approximately 80 fusions now characterized. Although fluorescence in situ hybridization (FISH) has high specificity for detecting MLL-rearrangements (MLL-r), sensitivity is limited and the translocation partner gene (TPG) cannot always be identified. In contrast, long-distance inverse-PCR (LDI-PCR) permits sequence-specific characterization of MLL breakpoints and the resultant fusion gene, which can then be used for monitoring minimal residual disease (MRD). A limitation of LDI-PCR is the relatively large input of DNA (≈ 1μg) required, with a blast cell percentage of &gt; 20-30% to achieve sufficient sensitivity. Next-generation sequencing (NGS) approaches such as RNAseq and whole-genome sequencing (WGS) have the potential to identify multiple gene fusions, however their ability to detect the full spectrum of MLL fusions is limited by coverage, read depth and thereby cost. Such limitations can potentially be overcome with targeted sequencing panels, although their performance against "gold standard" assays, such as LDI-PCR, is unknown. We therefore aimed to assess the ability of a novel, targeted NGS approach for characterizing patient-specific MLLgene rearrangements from low inputs of RNA. The Archer™ FusionPlex™ Heme and Myeloid panels utilize anchored multiplex PCR-based enrichment (AMP-E) to rapidly enrich a number of targets, including MLL, creating libraries for NGS. The NGS libraries are generated using rapid workflows and are compatible with nucleic acid inputs of ≈ 20-200ng. Briefly, double stranded cDNA is generated from patient RNA and subjected to end repair, adenylation and ligation with unique, half-functional adaptors. Following two rounds of nested PCR with primers attached to common sequencing adaptors, the resulting target amplicons become functional and ready for clonal amplification and sequencing. Using AMP-E, we tested 23 paediatric MLL-r samples (15 ALL, 8 AML) that had previously been analyzed by LDI-PCR and were known to harbor 8 different MLL fusions, including MLL-AFF1 (n = 8), -MLLT3 (5), -MLLT10 (3), -ELL (2), -DCP1A (1), -MLLT1 (1), - AFF3 (1), and -TNRC18 (1). A patient sample known to express BCR-ABL1 was used as a positive control and a cytogenetically normal AML sample in remission was used as a negative control in each panel. The median blast count for samples analyzed was 86.1% (range 25%-97%). On average, 100ng of RNA was used per sample, with RIN values ranging from 2.7 to 9.1. Libraries generated using either the Archer™ FusionPlex™ Heme or Myeloid kit were sequenced to sufficient read depths by Illumina MiSeq® and NextSeq®, respectively. Bioinformatic analyses were performed with the Archer™ Analysis 4.1 software. Results were then compared with fusions identified by LDI-PCR. There was high concordance between AMP-E and LDI-PCR, with all MLL fusion genes identified by LDI-PCR also detected by AMP-E. Of note, an ALL sample with t(11;19), unable to be characterized by LDI-PCR, was identified by AMP-E to express MLL-MLLT1. The control BCR-ABL1 fusion was identified in every run and there were no false-negative results. Furthermore, AMP-E identified multiple MLL-fusion transcripts in 56.5% of patients. Analysis of paired diagnosis-relapse samples from an AML patient with MLL-MLLT3demonstrated that the two discrete transcripts present at diagnosis persisted at relapse, with emergence of a third transcript. In summary, detection of MLL gene fusions in acute leukemia using AMP-E is both sensitive and specific. The low RNA requirement, rapid workflow, compatibility with Illumina MiSeq® and cloud-based proprietary analysis software, together with the array of additional fusions and mutations detected by the Archer™ panels, show promise for translation into clinical diagnostic settings. The persistence of discrete transcript isoforms at relapse also highlights the potential for AMP-E to identify multiple, patient-specific MLL fusion transcripts which may have utility in refining prognostication, MRD monitoring and informing future functional studies of MLL-driven leukemogenesis. Disclosures No relevant conflicts of interest to declare.
DOI: 10.2144/000114189
2014
A workflow to increase verification rate of chromosomal structural rearrangements using high-throughput next-generation sequencing
Somatic rearrangements, which are commonly found in human cancer genomes, contribute to the progression and maintenance of cancers. Conventionally, the verification of somatic rearrangements comprises many manual steps and Sanger sequencing. This is labor intensive when verifying a large number of rearrangements in a large cohort. To increase the verification throughput, we devised a high-throughput workflow that utilizes benchtop next-generation sequencing and in-house bioinformatics tools to link the laboratory processes. In the proposed workflow, primers are automatically designed. PCR and an optional gel electrophoresis step to confirm the somatic nature of the rearrangements are performed. PCR products of somatic events are pooled for Ion Torrent PGM and/or Illumina MiSeq sequencing, the resulting sequence reads are assembled into consensus contigs by a consensus assembler, and an automated BLAT is used to resolve the breakpoints to base level. We compared sequences and breakpoints of verified somatic rearrangements between the conventional and high-throughput workflow. The results showed that next-generation sequencing methods are comparable to conventional Sanger sequencing. The identified breakpoints obtained from next-generation sequencing methods were highly accurate and reproducible. Furthermore, the proposed workflow allows hundreds of events to be processed in a shorter time frame compared with the conventional workflow.
DOI: 10.1016/j.jid.2017.07.721
2017
524 Regional variation in epidermal susceptibility to ultraviolet induced carcinogenesis reflects proliferative activity of epidermal progenitors
Oncogenic mutations induce by UV can be found in normal skin suggesting that accumulation of oncogenic mutations is necessary to overcome cell intrinsic mechanisms as well as cell of origin restrictions towards tumour formation. A major determinant for a cell to accumulate mutations relies in its ability to persist long term and to give rise to a large clone of mutant cells. In this study, we used multicolour fate tracing (K14Cre/Er::Rainbow3 mice) to evaluate size changes in clones of epidermal cells in response to chronic suberythemal ultraviolet B radiation injury. Upon tamoxifen injection basal keratinocytes were labelled randomly with one of five possible fluorescent protein combinations and the size of different clones could be evaluated at different time points. Our findings highlight a bimodal progression of epidermal clones. Epidermal clones expanded more if attached to hair follicles (HF) (P< 0.0001) compared to those not attached that remained of smaller size despite months of UV irradiation. Although there was globally more epidermal proliferation in the presence of UVB irradiation, proliferating cells were concentrated within 60um of HF openings and clones distant from HF harboured label retaining cells suggesting their relative slow cycling behaviour. Functionally, microdissection of clones attached or not to HF followed by whole exome sequencing did not reveal any difference in mutation load between proliferative and slow-cycling clones. However in a UVB inducible murine BCC model (K14Cre/ER::Ptch1lox/+ mice), although keratin17 expressing groups of epidermal cells reflecting hedgehog pathway activation through loss of the second ptch1 allele were evenly distributed across dorsal skin, they were larger in size if attached to HF. Invasive BCCs emanated from HF attached clones. In conclusions, epidermal progenitors in proximity of HF give rise to larger clones more likely to be affected by a second mutation leading to epidermal carcinogenesis.
DOI: 10.1182/blood-2018-99-117759
2018
Aberrant Splicing of KMT2A As a Potential Molecular MRD Marker in Cytogenetically-Normal AML
Abstract Cytogenetically normal acute myeloid leukaemia (CN-AML) accounts for approximately 25%-30% of paediatric AML cases and carries a high risk of relapse. Minimal residual disease (MRD) is an essential factor in predicting relapse in acute leukaemia but is difficult to track for many CN-AML patients, due to the lack of a distinct and stable molecular marker. Consequently, new biomarkers are urgently required for MRD monitoring of the disease. Splicing variants, products of another hallmark of human cancers, aberrant splicing, have been shown informative in predicting responses to cancer treatment. Therefore, we characterized splicing events according to different cytogenetic features by targeted RNA-seq and interrogated the use of splicing variants in MRD monitoring of CN-AML. A total of 29 AML samples, collected from 18 de novo paediatric AML patients (median age 5.66 years, range 0.67 - 16.38 years) were analysed for this study. Among the 29 samples, 52% harboured a chromosome translocation, 21% were cytogenetically normal, and 28% showed a complex karyotype (defined as having 3 or more cytogenetic features). 100ng of total RNA, extracted from peripheral blood or bone marrow were subjected to library preparation using the Archer™ FusionPlex™ Heme and Myeloid panels, then sequenced using Illumina MiSeq® or NextSeq®. Novel splicing events and genetic mutations were identified by the Archer Dx analysis software in conjunction with normalisations against the library size and probe numbers. Splicing variants were validated using splicing junction-specific probe assays. Our results revealed 3249 novel splicing events in 29 AML samples. These events were classified into 4 major types (65% intron retention, 10% exon skipping, 8% exon out of order and 8% intra-exon gap), and 9 minor events that were combinations of the major types (9%). The number of splicing events per sample was not associated with the disease status or the presence of the mutations in the spliceosome encoding gene, SF3B1 or U2AF1. Instead, splicing variants were associated with cytogenetic features. Of note, an intron 13 retention of the KMT2A (MLL) gene was identified in all CN-AML samples, and was consistently expressed approximately 100 times higher in the CN-AML compared to other AML cases or remission samples. To assess whether KMT2A intron 13 retention could be a potential molecular MRD marker to monitor CN-AML, we measured its expression in samples from 5 independent CN-AML patients who had available samples for 3 time-points of disease progression. Our results demonstrated, with 95% detection power, that KMT2A intron 13 retention was differentially expressed at different time points (Figure 1). Moreover, the expression level of this splicing variant correlated with disease progression in every patient examined. In conclusion, these data suggest that intron 13 retention of KMT2A may be a novel molecular marker for MRD monitoring in CN-AML. Disclosures No relevant conflicts of interest to declare.
DOI: 10.1182/blood-2018-99-117761
2018
Quantitative Analysis of MLL Fusion Transcripts By Droplet Digital PCR to Monitor Minimal Residual Disease in MLL-Rearranged Acute Myeloid Leukemia
Abstract Rearrangements of the mixed lineage leukemia gene (MLL, re-named KMT2A) result in aggressive leukemia. Current risk stratification of MLL-rearranged (MLL-r) leukemia is directed by the fusion partner gene and, increasingly, by minimal residual disease (MRD) assessment after induction therapy. The clinical significance of quantifying fusion transcript levels in leukemia patients is firmly established in chronic myeloid leukemia and acute promyelocytic leukemia but is less well studied in MLL-r patients. Real-time quantitative PCR (RQ-PCR) is the standardized assay for molecular MRD monitoring in patients with MLL-rearranged leukemia. However, this method is less precise when few leukemic cells are present, thus limiting its application for highly sensitive MRD monitoring. Droplet digital PCR (ddPCR) allows for absolute quantification of fusion transcripts when multiple copies of fusion transcripts are present per cell. Therefore, we aimed to evaluate whether determining MLL fusion transcript levels by ddPCR could improve the sensitivity of MRD monitoring in MLL-r leukemia. A total of 44 diagnostic and follow-up samples obtained from paediatric MLL-r leukemia patients (26 ALL, 18 AML) were subjected to targeted next-generation sequencing to obtain patient-specific fusion sequences. MLL fusion transcripts were quantified by ddPCR in a total of 17 samples obtained from 4 paediatric AML patients with MLL fusions involving MLLT3 (n = 3) and MLLT10 (n = 1). Fusion-specific probe assays were designed from each of the patient specific fusion sequences for MRD assessment by ddPCR. To determine the detection limit of this method in quantifying MLL fusion transcripts, two MLL-r AML cell lines (MV4-11 and THP-1), and one MLL-wt cell line (Kasumi-1) were used. MLL fusion transcript level of detection of ddPCR was determined by serially diluting MLL-r cDNA into MLL-wt cDNA (Kasumi-1). Using 20ng of MLL-r cDNA in 200ng diluent as the highest concentration, a 10-fold dilution series was performed to make concentrations ranging from 10−2 to 10−7. Each ddPCR reaction mixture contained 11ul of cDNA mix as template with 1X Supermix no dUTP (Bio-Rad), 500 nM of both F/R primers and 250 nM of 5'-FAM labelled probe (IDT). Droplets were generated using a QX200 Droplet Generator (Bio-Rad). A general thermal cycler protocol with annealing at 61°C for 1 minute was performed and positive fluorescence droplets were read using QX200 Droplet Reader (Bio-Rad). MRD of patient samples, derived from ddPCR, was then compared to MRD derived from DNA-based RQ-PCR, following the guidelines established by the EuroMRD group. Our ddPCR method showed high reliability and sensitivity, with the detection limit determined to be 10-5 for a cell line with low MLL fusion transcript expression (THP-1), and 10-6 for a cell line with high MLL fusion transcript expression (MV4-11). Comparison of results obtained by RQ- PCR and ddPCR in a total of 17 diagnostic and follow-up samples from 4 AML patients showed excellent/good concordance between methods for 13 samples with moderate MRD levels. The 4 samples with low levels of MRD (10-4 to 10-5) below the quantitative range as defined by EuroMRD for RQ-PCR were all detectable by ddPCR, highlighting that ddPCR could provide robust and highly sensitive MRD assays compared to the standardized RQ-PCR assays. In conclusion, ddPCR is a promising technique that can reproducibly and reliably quantify MLL-r transcripts for MRD monitoring of MLL-r leukemia. Highly sensitive and robust molecular MRD monitoring by ddPCR holds promise for improving response-based therapeutic stratification and prediction of molecular relapse before overt hematological relapse. Disclosures No relevant conflicts of interest to declare.
DOI: 10.7490/f1000research.1117082.1
2019
Comparative performance of the BGI and Illumina sequencing technology for single-cell RNA-sequencing
The libraries generated by high-throughput single cell RNA-sequencing (scRNA-seq) platforms such as the Chromium from 10× Genomics require considerable amounts of sequencing, typically due to the large number of cells. The ability to use these data to address biological questions is directly impacted by the quality of the sequence data. Here we have compared the performance of the Illumina NextSeq 500 and NovaSeq 6000 against the BGI MGISEQ-2000 platform using identical Single Cell 3' libraries consisting of over 70 000 cells generated on the 10× Genomics Chromium platform. Our results demonstrate a highly comparable performance between the NovaSeq 6000 and MGISEQ-2000 in sequencing quality, and the detection of genes, cell barcodes, Unique Molecular Identifiers. The performance of the NextSeq 500 was also similarly comparable to the MGISEQ-2000 based on the same metrics. Data generated by both sequencing platforms yielded similar analytical outcomes for general single-cell analysis. The performance of the NextSeq 500 and MGISEQ-2000 were also comparable for the deconvolution of multiplexed cell pools via variant calling, and detection of guide RNA (gRNA) from a pooled CRISPR single-cell screen. Our study provides a benchmark for high-capacity sequencing platforms applied to high-throughput scRNA-seq libraries.
DOI: 10.1016/j.jid.2019.07.507
2019
457 Cancer associated fibroblast profiling reveals endothelin signalling as a novel mediator of niche to tumour cross-talk in Basal Cell Carcinoma
Basal Cell Carcinoma (BCC) is known to rely heavily on its underlying mesenchyme but little is understood regarding how the various cell types interact. Histological and gene expression changes in the dermis surrounding the tumour have been reported in BCCs as compared to normal skin. Targeting the tumour-stroma interactions may therefore be a viable strategy in controlling BCC onset. We here performed RNA sequencing on sorted CD26+ BCC associated fibroblasts (BAFs) in the BCC niche to examine their impact on tumour development. Using a genetically inducible model of ptch1 deletion (K14Cre x ptch1-/-) compared to unaffected littermates (K14Cre x ptch1lox/lox), we report that gene expression in changes BAFs occur early and evolve over time during the tumours development. Adhesion and metabolic alterations early in tumour development progress to immune system regulators becoming more pronounced in BAFs later in the process. Specifically, we have identified that Endothelin ligand from BCC epidermal tumour cells promotes Endothelin signalling in BAFs, and that this signalling is crucial for BCC growth. This work provides meaningful candidates for BCC therapeutic development and validates BAFs facilitate BCC development, from early in tumour development.
DOI: 10.1002/hon.91_2629
2019
EBV+ CNS LYMPHOMAS HAVE A DISTINCTIVE TUMOR MICROENVIRONMENT AND GENETIC PROFILE, WHICH IS AMENABLE TO COMBINATION 3RD PARTY EBV-SPECIFIC CTL AND IBRUTINIB THERAPY
Introduction: Primary CNS Lymphomas (PCNSL) with DLBCL histology in the immunosuppressed, e.g. after HIV (HIV+ PCNSL) or organ transplant (EBV+ PCNSL-PTLD: where anti-PD-1 is contraindicated), are characterized by dismal outcome and almost universal EBV positivity. However, incidence is low, biopsy material limited and immunogenetic characterization minimal. We provide detailed data (308 patients) comparing the genetic landscape and tumor microenvironment (TME) of EBV+ PCNSL-PTLD (22); HIV+ PCNSL (24); EBV- PCNSL (41); EBV- systemic (sy) DLBCL (199) and EBV+ syPTLD (22), which led to implementation of a rationally designed combination therapy. Methods: Targeted sequencing, CNV analysis, nanoString (for immune checkpoints and effectors, and macrophage gene expression) and in vitro assays were used. Based on the findings, a novel regimen was administered to patients’ refractory to or unsuitable for frontline therapy. Sequential imaging, pharmacokinetic (PK) and T cell assays were performed. Results: Mutational burden was much lower in EBV+ lymphomas than EBV- lymphomas (p<0.0001). Notably, genetic aberrations in BCR-NFB were rarely observed in EBV+ PCNSL-PTLD and HIV+ PCNSL, and (unlike EBV- PCNSL) in these patients, HLAI/II copy number loss was largely absent. Mutations in CARD11 (which confers ibrutinib resistance) were also rare in EBV+ PCNSL-PTLD and HIV+ PCNSL. By nanoString analysis, EBV+ PCNSL-PTLD expressed higher levels of the immunosuppressive ‘M2’ macrophage marker CD163 and LAG3 and PD-L1/L2 (p<0.001), and high levels of EBV protein LMP1 (known to upregulate PD-L1/L2 and NFB). In vitro co-incubation experiments showed a PCNSL line upregulated CD163+ PD-L1/L2+ M2 polarized monocyte/macrophages relative to a systemic DLBCL line (≥7-fold, p<0.001), and EBV infection of a PCNSL line enhanced NFB activity (Fig 1A). Three immunosuppressed patients (2 frontline, 1 refractory, ages 30-70yrs) with EBV+ lymphoma (2 PCNSL, 1 CNS + systemic) were treated with ibrutinib (starting dose 560mg) and 3rd party partially HLA matched EBV specific CTL. PK confirmed therapeutic CSF levels, including a patient on haemodialysis. CD8 T cells specific for EBV antigens present in PCNSL were detectable post infusion and had an effector memory phenotype (Fig 1B-C). Microchimerism post infusion was confirmed by ultra-sensitive ddPCR. Toxicity was manageable and all 3 are alive (2CR, 1PR). Keywords: Epstein-Barr virus (EBV); post-transplant lymphoproliferative disorders (PTLDs); primary CNS lymphoma (PCNSL). Disclosures: Gandhi, M: Consultant Advisory Role: Celgene, Merck Sharpe & Dohme, Janssen-Cilag, Gilead, Bristol Myers Squibb; Honoraria: Amgen, Janssen-Cilag, Gilead, Roche, Merck Sharpe & Dohme, Takeda; Research Funding: Celgene, Bristol Myers Squibb, Janssen-Cilag, Gilead; Other Remuneration: Roche. Tobin, J: Honoraria: Amgen, Janssen – conference attendance 2017. Trappe, R: Consultant Advisory Role: Abbvie, Atara; Research Funding: Hoffmann-La Roche; Other Remuneration: Travel support from Celgene, Janssen, Gildead, Abbvie. Blyth, E: Consultant Advisory Role: Abbvie, MSD, Novartis. Wight, J: Honoraria: Janssen, Alexion. Keane, C: Consultant Advisory Role: Celgene, Gilead, MSD; Other Remuneration: Roche (Conference Travel).
DOI: 10.1200/jco.2022.40.16_suppl.e15034
2022
Reducing pre-analytical sample QC failure rates for cancer molecular genetic assays with SLIMamp technology.
e15034 Background: Identification of somatic variants in cancer by high-throughput sequencing has become common clinical practice largely because many of these variants may be predictive biomarkers for targeted therapies. However, there can be high sample QC failure rates for some assays (sometimes up to 40%), preventing the return of results that may affect patient treatment decisions. Pillar Biosciences has incorporated their patented SLIMamp technology into commercially available cancer NGS testing kits with the claim that these kits can successfully interrogate challenging formalin-fixed paraffin-embedded tissue (FFPET) samples with low tumor purity, poor DNA quality, and/or low input DNA, resulting in a high sample QC pass rate. The aim of this study was to substantiate that claim using Pillar’s amplicon-based oncoReveal Solid Tumor Panel. Methods: We acquired 48 tumor samples that had failed one or more pre-analytical QC sample parameters for whole exome sequencing (WES) from ATGC’s ISO15189 accredited diagnostic genomics laboratory. XING Genomic Services performed an exploratory data analysis using our pre-analytical QC assays to characterise the samples and then sequenced the samples in our ISO15189 accredited laboratory using the validated oncoReveal Solid Tumor Panel. Results: We were able to achieve high sequencing coverage (&gt;3000X) for all 48 samples and explored the determinants of sample “success”. We were able to generate clinical reports for 45 samples (94%), of which 38 (79%) contained clinically actionable or significant variants that would not have otherwise been identified. Ten samples had a higher number of total variant calls with over-representation of C&gt;T transitions representing stochastically-amplified, formalin-induced artefacts in samples with very low input template DNA. Of these, 7 cases had reportable variants and 3 were deemed unreportable. We demonstrated that DNA integrity is the major determinant of success even in samples with low input DNA or low tumor purity and were able to further refine pre-analytical and post-analytical QC metrics to better identify samples with poor quality DNA that can be sequenced reliably. Conclusions: In this study, we showed that the Pillar Biosciences oncoReveal Solid Tumor Panel, which uses SLIMam technology, was able to generate reliable, interpretable results for 94% of samples that failed pre-analytical QC for WES, substantiating Pillar’s claim.
DOI: 10.1016/j.jid.2021.02.093
2021
076 Subtype specific analyses reveal infiltrative basal cell carcinoma are highly interactive with their environment
Little is known regarding the molecular differences between BCC subtypes, despite clearly distinct phenotypes and clinical outcomes. In particular, infiltrative BCCs have poorer clinical outcomes in terms of response to therapy and propensity for dissemination. In this project we aimed to use exome sequencing and RNA sequencing to identify somatic mutations and molecular pathways leading to infiltrative BCCs. Using whole exome sequencing of 36 BCC samples (8 infiltrative) combined with previously reported exome data (58 samples), we determine that infiltrative BCC do not contain a distinct somatic variant profile and carry classical UV induced mutational signatures. RNA sequencing on both datasets revealed key differentially expressed genes such as POSTN and WISP1 suggesting increased integrin and Wnt signalling. Immunostaining for POSTN and WISP1 clearly distinguished infiltrative BCCs and nuclear beta-catenin staining patterns further validated the resulting increase in Wnt signalling in infiltrative BCCs. Of significant interest, in BCCs with mixed morphology, infiltrative areas expressed WISP1 while nodular areas did not, supporting a continuum between subtypes. In conclusion, infiltrative BCCs do not differ in their genomic alteration in terms of initiating mutations. They display a specific type of interaction with the extracellular matrix environment regulating Wnt signalling.
DOI: 10.1002/(sici)1097-4644(19981001)71:1<1::aid-jcb1>3.3.co;2-r
1998
Dual cytoplasmic and nuclear distribution of the novel arsenite‐stimulated human ATPase (hASNA‐I)