ϟ

Yoshihide Hayashizaki

Here are all the papers by Yoshihide Hayashizaki that you can download and read on OA.mg.
Yoshihide Hayashizaki’s last known institution is . Download Yoshihide Hayashizaki PDFs here.

Claim this Profile →
DOI: 10.1038/nature01262
2002
Cited 6,563 times
Initial sequencing and comparative analysis of the mouse genome
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
DOI: 10.1038/nature11233
2012
Cited 4,479 times
Landscape of transcription in human cells
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
DOI: 10.1038/nature12787
2014
Cited 2,255 times
An atlas of active enhancers across human cell types and tissues
Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation. Using the FANTOM5 CAGE expression atlas, the authors show that bidirectional capped RNAs are a signature feature of active enhancers and identify over 40,000 enhancer candidates from over 800 human cell and tissue samples across the whole human body. FANTOM5 (standing for functional annotation of the mammalian genome 5) is the fifth major stage of a major international collaboration that aims to dissect the transcriptional regulatory networks that define every human cell type. Two Articles in this issue of Nature present some of the project's latest results. The first paper uses the FANTOM5 panel of tissue and primary cell samples to define an atlas of active, in vivo bidirectionally transcribed enhancers across the human body. These authors show that bidirectional capped RNAs are a signature feature of active enhancers and identify more than 40,000 enhancer candidates from over 800 human cell and tissue samples. The enhancer atlas is used to compare regulatory programs between different cell types and identify disease-associated regulatory SNPs, and will be a resource for studies on cell-type-specific enhancers. In the second paper, single-molecule sequencing is used to map human and mouse transcription start sites and their usage in a panel of distinct human and mouse primary cells, cell lines and tissues to produce the most comprehensive mammalian gene expression atlas to date. The data provide a plethora of insights into open reading frames and promoters across different cell types in addition to valuable annotation of mammalian cell-type-specific transcriptomes.
DOI: 10.1046/j.1365-313x.2002.01359.x
2002
Cited 1,832 times
Monitoring the expression profiles of 7000 <i>Arabidopsis</i> genes under drought, cold and high‐salinity stresses using a full‐length cDNA microarray
Full-length cDNAs are essential for functional analysis of plant genes in the post-sequencing era of the Arabidopsis genome. Recently, cDNA microarray analysis has been developed for quantitative analysis of global and simultaneous analysis of expression profiles. We have prepared a full-length cDNA microarray containing approximately 7000 independent, full-length cDNA groups to analyse the expression profiles of genes under drought, cold (low temperature) and high-salinity stress conditions over time. The transcripts of 53, 277 and 194 genes increased after cold, drought and high-salinity treatments, respectively, more than fivefold compared with the control genes. We also identified many highly drought-, cold- or high-salinity- stress-inducible genes. However, we observed strong relationships in the expression of these stress-responsive genes based on Venn diagram analysis, and found 22 stress-inducible genes that responded to all three stresses. Several gene groups showing different expression profiles were identified by analysis of their expression patterns during stress-responsive gene induction. The cold-inducible genes were classified into at least two gene groups from their expression profiles. DREB1A was included in a group whose expression peaked at 2 h after cold treatment. Among the drought, cold or high-salinity stress-inducible genes identified, we found 40 transcription factor genes (corresponding to approximately 11% of all stress-inducible genes identified), suggesting that various transcriptional regulatory mechanisms function in the drought, cold or high-salinity stress signal transduction pathways.
DOI: 10.1038/nature05260
2006
Cited 1,670 times
Insights into social insects from the genome of the honeybee Apis mellifera
Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
DOI: 10.1126/science.1112009
2005
Cited 1,550 times
Antisense Transcription in the Mammalian Transcriptome
Antisense transcription (transcription from the opposite strand to a protein-coding or sense strand) has been ascribed roles in gene regulation involving degradation of the corresponding sense transcripts (RNA interference), as well as gene silencing at the chromatin level. Global transcriptome analysis provides evidence that a large proportion of the genome can produce transcripts from both strands, and that antisense transcripts commonly link neighboring "genes" in complex loci into chains of linked transcriptional units. Expression profiling reveals frequent concordant regulation of sense/antisense pairs. We present experimental evidence that perturbation of an antisense RNA can alter the expression of sense messenger RNAs, suggesting that antisense transcription contributes to control of transcriptional outputs in mammals.
DOI: 10.1038/ng1789
2006
Cited 1,259 times
Genome-wide analysis of mammalian promoter architecture and evolution
DOI: 10.1105/tpc.13.1.61
2001
Cited 1,051 times
Monitoring the Expression Pattern of 1300 Arabidopsis Genes under Drought and Cold Stresses by Using a Full-Length cDNA Microarray
Full-length cDNAs are essential for functional analysis of plant genes. Using the biotinylated CAP trapper method, we constructed full-length Arabidopsis cDNA libraries from plants in different conditions, such as drought-treated, cold-treated, or unstressed plants, and at various developmental stages from germination to mature seed. We prepared a cDNA microarray using approximately 1300 full-length Arabidopsis cDNAs to identify drought- and cold-inducible genes and target genes of DREB1A/CBF3, a transcription factor that controls stress-inducible gene expression. In total, 44 and 19 cDNAs for drought- and cold-inducible genes, respectively, were isolated, 30 and 10 of which were novel stress-inducible genes that have not been reported as drought- or cold-inducible genes previously. Twelve stress-inducible genes were identified as target stress-inducible genes of DREB1A, and six of them were novel. On the basis of RNA gel blot and microarray analyses, the six genes were identified as novel drought- and cold-inducible genes that are controlled by DREB1A. Eleven DREB1A target genes whose genomic sequences have been registered in the GenBank database contained the dehydration-responsive element (DRE) or DRE-related CCGAC core motif in their promoter regions. These results show that our full-length cDNA microarray is a useful material with which to analyze the expression pattern of Arabidopsis genes under drought and cold stresses, to identify target genes of stress-related transcription factors, and to identify potential cis-acting DNA elements by combining the expression data with the genomic sequence data.
DOI: 10.1126/science.1088305
2003
Cited 887 times
Empirical Analysis of Transcriptional Activity in the <i>Arabidopsis</i> Genome
Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.
DOI: 10.1038/nature21374
2017
Cited 867 times
An atlas of human long non-coding RNAs with accurate 5′ ends
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome. A catalogue of human long non-coding RNA genes and their expression profiles across samples from major human primary cell types, tissues and cell lines. Alistair Forrest, Piero Carninci and colleagues of the FANTOM Consortium provide a catalogue of human long non-coding RNA (lncRNA) genes and their expression profiles across samples from human primary cell types, tissues and cell lines. They used combined analyses of multiple data sets to identify 27,919 lncRNA genes with high-confidence 5′ ends, as well as a subset of 19,175 potentially functional lncRNA loci. The lncRNA catalogue and annotations are available through an open web resource.
DOI: 10.1126/science.1081288
2003
Cited 841 times
Collection, Mapping, and Annotation of Over 28,000 cDNA Clones from <i>japonica</i> Rice
We collected and completely sequenced 28,469 full-length complementary DNA clones from Oryza sativa L. ssp. japonica cv. Nipponbare. Through homology searches of publicly available sequence data, we assigned tentative protein functions to 21,596 clones (75.86%). Mapping of the cDNA clones to genomic DNA revealed that there are 19,000 to 20,500 transcription units in the rice genome. Protein informatics analysis against the InterPro database revealed the existence of proteins presented in rice but not in Arabidopsis . Sixty-four percent of our cDNAs are homologous to Arabidopsis proteins.
DOI: 10.1038/ng.368
2009
Cited 714 times
The regulated retrotransposon transcriptome of mammalian cells
DOI: 10.1186/s13059-014-0560-6
2015
Cited 697 times
Gateways to the FANTOM5 promoter level mammalian expression atlas
The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource ( http://fantom.gsc.riken.jp/5/ ). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
DOI: 10.1126/science.1105776
2005
Cited 693 times
High-Throughput Mapping of a Dynamic Signaling Network in Mammalian Cells
Signaling pathways transmit information through protein interaction networks that are dynamically regulated by complex extracellular cues. We developed LUMIER (for luminescence-based mammalian interactome mapping), an automated high-throughput technology, to map protein-protein interaction networks systematically in mammalian cells and applied it to the transforming growth factor–β (TGFβ) pathway. Analysis using self-organizing maps and k -means clustering identified links of the TGFβ pathway to the p21-activated kinase (PAK) network, to the polarity complex, and to Occludin, a structural component of tight junctions. We show that Occludin regulates TGFβ type I receptor localization for efficient TGFβ-dependent dissolution of tight junctions during epithelial-to-mesenchymal transitions.
DOI: 10.1016/j.cell.2010.01.044
2010
Cited 692 times
An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man
Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.
DOI: 10.1073/pnas.2136655100
2003
Cited 674 times
Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage
We introduce cap analysis gene expression (CAGE), which is based on preparation and sequencing of concatamers of DNA tags deriving from the initial 20 nucleotides from 5′ end mRNAs. CAGE allows high-throughout gene expression analysis and the profiling of transcriptional start points (TSP), including promoter usage analysis. By analyzing four libraries (brain, cortex, hippocampus, and cerebellum), we redefined more accurately the TSPs of 11-27% of the analyzed transcriptional units that were hit. The frequency of CAGE tags correlates well with results from other analyses, such as serial analysis of gene expression, and furthermore maps the TSPs more accurately, including in tissue-specific cases. The high-throughput nature of this technology paves the way for understanding gene networks via correlation of promoter usage and gene transcriptional factor expression.
DOI: 10.1038/35055500
2001
Cited 634 times
Functional annotation of a full-length mouse cDNA collection
The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.
DOI: 10.1126/science.1071006
2002
Cited 626 times
Functional Annotation of a Full-Length <i>Arabidopsis</i> cDNA Collection
Full-length complementary DNAs (cDNAs) are essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We isolated 155,144 RIKEN Arabidopsis full-length (RAFL) cDNA clones. The 3′-end expressed sequence tags (ESTs) of 155,144 RAFL cDNAs were clustered into 14,668 nonredundant cDNA groups, about 60% of predicted genes. We also obtained 5′ ESTs from 14,034 nonredundant cDNA groups and constructed a promoter database. The sequence database of the RAFL cDNAs is useful for promoter analysis and correct annotation of predicted transcription units and gene products. Furthermore, the full-length cDNAs are useful resources for analyses of the expression profiles, functions, and structures of plant proteins.
DOI: 10.1101/gr.4200206
2005
Cited 480 times
Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome
Recent large-scale analyses of mainly full-length cDNA libraries generated from a variety of mouse tissues indicated that almost half of all representative cloned sequences did not contain an apparent protein-coding sequence, and were putatively derived from non-protein-coding RNA (ncRNA) genes. However, many of these clones were singletons and the majority were unspliced, raising the possibility that they may be derived from genomic DNA or unprocessed pre-mRNA contamination during library construction, or alternatively represent nonspecific "transcriptional noise." Here we show, using reverse transcriptase-dependent PCR, microarray, and Northern blot analyses, that many of these clones were derived from genuine transcripts of unknown function whose expression appears to be regulated. The ncRNA transcripts have larger exons and fewer introns than protein-coding transcripts. Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes. A significant proportion of these transcripts exhibit tissue-specific expression patterns, as well as dynamic changes in their expression in macrophages following lipopolysaccharide stimulation. Taken together, the data provide strong support for the conclusion that ncRNAs are an important, regulated component of the mammalian transcriptome.
DOI: 10.1038/nrg2026
2007
Cited 468 times
Mammalian RNA polymerase II core promoters: insights from genome-wide studies
DOI: 10.1038/nbt.3947
2017
Cited 451 times
An integrated expression atlas of miRNAs and their promoters in human and mouse
An atlas of microRNA expression patterns and regulators is produced by deep sequencing of short RNAs in human and mouse cells. MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
DOI: 10.1007/s10142-002-0070-6
2002
Cited 444 times
Monitoring the expression pattern of around 7,000 Arabidopsis genes under ABA treatments using a full-length cDNA microarray
DOI: 10.1038/nmeth0306-211
2006
Cited 397 times
CAGE: cap analysis of gene expression
DOI: 10.1371/journal.pbio.1000625
2011
Cited 392 times
The Reality of Pervasive Transcription
Current estimates indicate that only about 1.2% of the mammalian genome codes for amino acids in proteins. However, mounting evidence over the past decade has suggested that the vast majority of the genome is transcribed, well beyond the boundaries of known genes, a phenomenon known as pervasive transcription [1]. Challenging this view, an article published in PLoS Biology by van Bakel et al. concluded that “the genome is not as pervasively transcribed as previously reported” [2] and that the majority of the detected low-level transcription is due to technical artefacts and/or background biological noise. These conclusions attracted considerable publicity [3]–[6]. Here, we present an evaluation of the analysis and conclusions of van Bakel et al. compared to those of others and show that (1) the existence of pervasive transcription is supported by multiple independent techniques; (2) re-analysis of the van Bakel et al. tiling arrays shows that their results are atypical compared to those of ENCODE and lack independent validation; and (3) the RNA sequencing dataset used by van Bakel et al. suffered from insufficient sequencing depth and poor transcript assembly, compromising their ability to detect the less abundant transcripts outside of protein-coding genes. We conclude that the totality of the evidence strongly supports pervasive transcription of mammalian genomes, although the biological significance of many novel coding and noncoding transcripts remains to be explored.
DOI: 10.1261/rna.1528909
2009
Cited 392 times
Small RNAs derived from snoRNAs
Small nucleolar RNAs (snoRNAs) guide RNA modification and are localized in nucleoli and Cajal bodies in eukaryotic cells. Components of the RNA silencing pathway associate with these structures, and two recent reports have revealed that a human and a protozoan snoRNA can be processed into miRNA-like RNAs. Here we show that small RNAs with evolutionary conservation of size and position are derived from the vast majority of snoRNA loci in animals (human, mouse, chicken, fruit fly), Arabidopsis , and fission yeast. In animals, sno-derived RNAs (sdRNAs) from H/ACA snoRNAs are predominantly 20–24 nucleotides (nt) in length and originate from the 3′ end. Those derived from C/D snoRNAs show a bimodal size distribution at ∼17–19 nt and &gt;27 nt and predominantly originate from the 5′ end. SdRNAs are associated with AGO7 in Arabidopsis and Ago1 in fission yeast with characteristic 5′ nucleotide biases and show altered expression patterns in fly loquacious and Dicer-2 and mouse Dicer1 and Dgcr8 mutants. These findings indicate that there is interplay between the RNA silencing and snoRNA-mediated RNA processing systems, and that sdRNAs comprise a novel and ancient class of small RNAs in eukaryotes.
DOI: 10.1038/nature08283
2009
Cited 351 times
An RNA-dependent RNA polymerase formed by TERT and the RMRP RNA
Constitutive expression of telomerase in human cells prevents the onset of senescence and crisis by maintaining telomere homeostasis. However, accumulating evidence suggests that the human telomerase reverse transcriptase catalytic subunit (TERT) contributes to cell physiology independently of its ability to elongate telomeres. Here we show that TERT interacts with the RNA component of mitochondrial RNA processing endoribonuclease (RMRP), a gene that is mutated in the inherited pleiotropic syndrome cartilage-hair hypoplasia. Human TERT and RMRP form a distinct ribonucleoprotein complex that has RNA-dependent RNA polymerase (RdRP) activity and produces double-stranded RNAs that can be processed into small interfering RNA in a Dicer (also known as DICER1)-dependent manner. These observations identify a mammalian RdRP composed of TERT in complex with RMRP.
DOI: 10.1073/pnas.0932694100
2003
Cited 340 times
Comparative genomics of <i>Physcomitrella patens</i> gametophytic transcriptome and <i>Arabidopsis thaliana</i> : Implication for land plant evolution
The mosses and flowering plants diverged >400 million years ago. The mosses have haploid-dominant life cycles, whereas the flowering plants are diploid-dominant. The common ancestors of land plants have been inferred to be haploid-dominant, suggesting that genes used in the diploid body of flowering plants were recruited from the genes used in the haploid body of the ancestors during the evolution of land plants. To assess this evolutionary hypothesis, we constructed an EST library of the moss Physcomitrella patens, and compared the moss transcriptome to the genome of Arabidopsis thaliana. We constructed full-length enriched cDNA libraries from auxin-treated, cytokinin-treated, and untreated gametophytes of P. patens, and sequenced both ends of >40,000 clones. These data, together with the mRNA sequences in the public databases, were assembled into 15,883 putative transcripts. Sequence comparisons of A. thaliana and P. patens showed that at least 66% of the A. thaliana genes had homologues in P. patens. Comparison of the P. patens putative transcripts with all known proteins, revealed 9,907 putative transcripts with high levels of similarity to vascular plant genes, and 850 putative transcripts with high levels of similarity to other organisms. The haploid transcriptome of P. patens appears to be quite similar to the A. thaliana genome, supporting the evolutionary hypothesis. Our study also revealed that a number of genes are moss specific and were lost in the flowering plant lineage.
DOI: 10.1073/pnas.88.21.9523
1991
Cited 336 times
A genomic scanning method for higher organisms using restriction sites as landmarks.
We have developed a powerful genomic scanning method, termed "restriction landmark genomic scanning," that is useful for analysis of the genomic DNA of higher organisms using restriction sites as landmarks. Genomic DNA is radioactively labeled at cleavage sites specific for a rare cleaving restriction enzyme and then size-fractionated in one dimension. The fractionated DNA is further digested with another more frequently occurring enzyme and separated in the second dimension. This procedure gives a two-dimensional pattern with thousands of scattered spots corresponding to sites for the first enzyme, indicating that the genome of mammals can be scanned at approximately 1-megabase intervals. The position and intensity of a spot reflect its locus and the copy number of the corresponding restriction site, respectively, based on the nature of the end-labeling system. Therefore, this method is widely applicable to genome mapping or detection of alterations in a genome.
DOI: 10.1038/ng.312
2009
Cited 332 times
Tiny RNAs associated with transcription start sites in animals
DOI: 10.1038/ng0595-77
1995
Cited 329 times
The reeler gene encodes a protein with an EGF–like motif expressed by pioneer neurons
DOI: 10.1101/gr.106054.110
2010
Cited 323 times
A comprehensive survey of 3′ animal miRNA modification events and a possible role for 3′ adenylation in modulating miRNA targeting effectiveness
Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.
DOI: 10.15252/msb.20156492
2015
Cited 314 times
Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation
Article28 December 2015Open Access Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation James Fraser James Fraser Department of Biochemistry, Goodman Cancer Centre, McGill University, Montréal, QC, Canada Search for more papers by this author Carmelo Ferrai Carmelo Ferrai Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Andrea M Chiariello Andrea M Chiariello Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author Markus Schueler Markus Schueler Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Tiago Rito Tiago Rito Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Giovanni Laudanno Giovanni Laudanno Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author Mariano Barbieri Mariano Barbieri Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Benjamin L Moore Benjamin L Moore MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Dorothee CA Kraemer Dorothee CA Kraemer Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Stuart Aitken Stuart Aitken MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Sheila Q Xie Sheila Q Xie Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Kelly J Morris Kelly J Morris Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Masayoshi Itoh Masayoshi Itoh RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Hideya Kawaji Hideya Kawaji RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Ines Jaeger Ines Jaeger Stem Cell Neurogenesis Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Yoshihide Hayashizaki Yoshihide Hayashizaki RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Search for more papers by this author Piero Carninci Piero Carninci Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Alistair RR Forrest Alistair RR Forrest Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author The FANTOM Consortium The FANTOM Consortium Search for more papers by this author Colin A Semple Corresponding Author Colin A Semple MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Josée Dostie Corresponding Author Josée Dostie Department of Biochemistry, Goodman Cancer Centre, McGill University, Montréal, QC, Canada Search for more papers by this author Ana Pombo Corresponding Author Ana Pombo Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Mario Nicodemi Corresponding Author Mario Nicodemi Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author James Fraser James Fraser Department of Biochemistry, Goodman Cancer Centre, McGill University, Montréal, QC, Canada Search for more papers by this author Carmelo Ferrai Carmelo Ferrai Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Andrea M Chiariello Andrea M Chiariello Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author Markus Schueler Markus Schueler Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Tiago Rito Tiago Rito Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Giovanni Laudanno Giovanni Laudanno Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author Mariano Barbieri Mariano Barbieri Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Benjamin L Moore Benjamin L Moore MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Dorothee CA Kraemer Dorothee CA Kraemer Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Search for more papers by this author Stuart Aitken Stuart Aitken MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Sheila Q Xie Sheila Q Xie Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Kelly J Morris Kelly J Morris Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Masayoshi Itoh Masayoshi Itoh RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Hideya Kawaji Hideya Kawaji RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Ines Jaeger Ines Jaeger Stem Cell Neurogenesis Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Yoshihide Hayashizaki Yoshihide Hayashizaki RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan Search for more papers by this author Piero Carninci Piero Carninci Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author Alistair RR Forrest Alistair RR Forrest Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan Search for more papers by this author The FANTOM Consortium The FANTOM Consortium Search for more papers by this author Colin A Semple Corresponding Author Colin A Semple MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK Search for more papers by this author Josée Dostie Corresponding Author Josée Dostie Department of Biochemistry, Goodman Cancer Centre, McGill University, Montréal, QC, Canada Search for more papers by this author Ana Pombo Corresponding Author Ana Pombo Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK Search for more papers by this author Mario Nicodemi Corresponding Author Mario Nicodemi Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy Search for more papers by this author Author Information James Fraser1,‡, Carmelo Ferrai2,3,‡, Andrea M Chiariello4,‡, Markus Schueler2,‡, Tiago Rito2,‡, Giovanni Laudanno4,‡, Mariano Barbieri2, Benjamin L Moore5, Dorothee CA Kraemer2, Stuart Aitken5, Sheila Q Xie3,9, Kelly J Morris2,3, Masayoshi Itoh6,7, Hideya Kawaji6,7, Ines Jaeger8,10, Yoshihide Hayashizaki6, Piero Carninci7, Alistair RR Forrest7,11, , Colin A Semple 5, Josée Dostie 1, Ana Pombo 2,3 and Mario Nicodemi 4 1Department of Biochemistry, Goodman Cancer Centre, McGill University, Montréal, QC, Canada 2Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine, Berlin-Buch, Germany 3Genome Function Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK 4Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant'Angelo, Naples, Italy 5MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK 6RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, Japan 7Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa, Japan 8Stem Cell Neurogenesis Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK 9Present address: Single Molecule Imaging Group, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, UK 10Present address: Cardiff School of Biosciences, Cardiff, UK 11Present address: Systems Biology and Genomics, Harry Perkins Institute of Medical Research, Nedlands, WA, Australia ‡These authors contributed equally to this work *Corresponding author. Tel: +44 131 651 8614; E-mail: [email protected] *Corresponding author. Tel: +1 514 398 4975; E-mail: [email protected] *Corresponding author. Tel: +49 30 94061752; E-mail: [email protected] *Corresponding author. Tel: +39 081 676475; E-mail: [email protected] Molecular Systems Biology (2015)11:852https://doi.org/10.15252/msb.20156492 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Mammalian chromosomes fold into arrays of megabase-sized topologically associating domains (TADs), which are arranged into compartments spanning multiple megabases of genomic DNA. TADs have internal substructures that are often cell type specific, but their higher-order organization remains elusive. Here, we investigate TAD higher-order interactions with Hi-C through neuronal differentiation and show that they form a hierarchy of domains-within-domains (metaTADs) extending across genomic scales up to the range of entire chromosomes. We find that TAD interactions are well captured by tree-like, hierarchical structures irrespective of cell type. metaTAD tree structures correlate with genetic, epigenomic and expression features, and structural tree rearrangements during differentiation are linked to transcriptional state changes. Using polymer modelling, we demonstrate that hierarchical folding promotes efficient chromatin packaging without the loss of contact specificity, highlighting a role far beyond the simple need for packing efficiency. Synopsis Genome-wide mapping of chromatin architecture reveals a hierarchical folding of chromatin that involves higher-order domains interactions across the whole chromosomes, reflects epigenomic features and reorganizes upon differentiation-induced gene expression changes. Chromatin architecture is mapped genome-wide using Hi-C and a neuronal differentiation model from mESC to post-mitotic neurons. Mammalian chromosomes fold hierarchically in a manner that reflects epigenomic features and involves higher-order domains (metaTADs) up to the chromosome scale. metaTAD topologies are relatively conserved through differentiation, and their reorganization is related to gene expression changes. Polymer modelling shows that hierarchical chromatin folding promotes efficient packaging without the loss of contact specificity. Introduction The spatial organization of chromatin in cell nuclei has essential functional roles. In mammals, chromosomes occupy distinct territories and have preferred radial positions that depend on cell type and transcription activity (Lanctot et al, 2007; Misteli, 2007; Bickmore & van Steensel, 2013; Tanay & Cavalli, 2013). Within chromosomes, chromatin is organized in megabase-sized regions, known as topologically associating domains (TADs), characterized by enriched levels of interactions (Dixon et al, 2012; Nora et al, 2012). TADs appear to contain inner substructures as revealed by high-resolution analyses (Sexton et al, 2012; Phillips-Cremins et al, 2013). At a larger scale, they generally fall into either compartment A or B, which are nuclear domains related to genomic function, up to tens of Mb in size enriched in active or repressed chromatin states, respectively (Lieberman-Aiden et al, 2009). Yet, the specificity of TAD contacts within compartments A/B and how different structural levels of chromatin folding integrate with nuclear functions from the scale of individual genes up to the scale of chromosomes remain unclear. In particular, we lack a comprehensive understanding of the higher-order organization of TADs, the different scales to which TAD–TAD contacts extend, and how these higher-order structures change upon cell differentiation. Here, we investigate higher-order TAD interactions in a neuronal differentiation model from mouse embryonic stem cells (ESC) via neural progenitor cells (NPC) to neurons. Novel analyses of Hi-C data sets during differentiation reveal that TADs form a hierarchy of domains-within-domains that we name "metaTADs". The metaTAD hierarchy extends across genomic scales up to the size range of entire chromosomes. We show that the complex inter-TAD interactions can be understood as relatively simple tree-like hierarchical structures irrespective of cell type. By comparing our Hi-C data with a variety of other data sets, we find that metaTAD tree structures correlate with patterns of epigenomic and expression features. Furthermore, the dynamics of tree rearrangements during differentiation link nuclear organization to transcriptional changes, providing a new paradigm to study chromatin structure and function. Using polymer modelling, we also demonstrate that hierarchical folding promotes efficient chromatin packaging without the loss of contact specificity. Our work highlights the close relationship between chromosome structure and function in mammalian nuclei, suggesting a functional role for hierarchical chromatin organization beyond simple chromatin packing efficiency. Results To investigate higher-order chromatin folding during differentiation, we studied proliferating mouse embryonic stem cells (ESC), intermediate neuronal precursor cells (NPC) and post-mitotic neurons (Neurons; Fig 1A). ESC (46C cell line) were differentiated using a protocol optimized for large-scale production of functional murine neurons with a midbrain phenotype (Jaeger et al, 2011), and each time point showed homogeneous expression of stage-specific markers (Figs 1B and EV1A). Characteristic expression patterns for the cell types under study were also confirmed by genome-wide gene expression analyses by CAGE (cap analysis of gene expression) (Kodzius et al, 2006; Takahashi et al, 2012; Forrest et al, 2014) and Gene Ontology (Fig EV1B; Table EV1). Incorporation of bromo-deoxyuridine (BrdU; 24 h) to mark cells undergoing DNA replication shows that while ESC and NPC are actively cycling, Neurons have ceased cell division (Fig 1C). Figure 1. Chromatin contact maps (Hi-C) and matched gene expression (CAGE) data in murine neuronal differentiation system Scheme of the murine differentiation system of our study, from ESC to NPC and post-mitotic neurons. Cells express stage-specific markers as detected by immunofluorescence: ESC express Oct4, NPC the neuronal precursor marker nestin and Neurons Tubb3 (Tuj1 antibody). Scale bar, 100 μm. ESC and NPC are actively cycling, whereas Neurons are negative for BrDNA after 24-h BrdU incorporation. Nuclei were counterstained with DAPI. Examples of interaction patterns in Hi-C matrices across the whole chromosomes show extensive higher-order dynamic contacts, which change during terminal neuronal differentiation. Hi-C interaction data are plotted in log scale. Matched CAGE data sets were produced from total RNA extracted from ESC, NPC and Neurons. The expression levels confirm specific expression of stage-specific markers. CAGE expression reported as a percentage relative to highest expression. Download figure Download PowerPoint Click here to expand this figure. Figure EV1. Differentiation of mouse ESC (46C line) into midbrain-like neuronal cells Cell populations were tested by immunofluorescence staining for their purity using stage-specific markers (pseudocoloured red), which showed efficient progression of ESC through the differentiation steps. Oct4 expression in ESC is lost upon differentiation, nestin is specifically expressed in NPC, and Tubb3 (detected using Tuj1 antibodies) is strongly expressed in Neurons. Nuclei were counterstained with DAPI (pseudocoloured blue). Scale bar represents 100 μm. Total RNA was extracted from ESC, NPC and Neurons, and directional CAGE data sets were produced in order to measure RNA transcription and define transcription start sites in each time point. Strand-specific CAGE reads are represented (+ and – strands). The promoter regions of the stage-specific markers are reported. CAGE signals for Oct4 (Pou5f1), nestin (Nes) and Tubb3 genes peak in ESC, NPC and Neurons, respectively. Download figure Download PowerPoint We produced Hi-C libraries for ESC, NPC and Neurons (Fig 1D), using a modified Hi-C protocol (Appendix Fig S1), which increases the yield of chromatin interaction products. Normalized Hi-C matrices show typical organization of chromatin into blocks of enriched interactions reflecting the existence of compartments and TADs (Fig 1D). This organization is chromosome specific, and we observe extensive changes during differentiation in the landscape of higher-order contacts of each chromosome (Appendix Figs S2 and S3). These patterns of structural dynamics often extend across the whole chromosomes and are accompanied by changes in genome-wide transcription activity in CAGE data that were produced from matched samples (examined in detail below). For instance, among the changes measured by CAGE, we find a quick depletion of the pluripotency transcription factors Oct4 and Rex1 after the ESC stage (Fig 1E). Similarly, we find that nestin and Fgf5 are highly expressed in NPC, whereas the neuronal markers Neurog2 and Tubb3 are expressed in differentiated neurons (Fig 1E). TAD–TAD contacts extend across genomic scales to define higher-order structures To investigate the architecture of higher-order chromosome folding, we first identified TAD positions across chromosomes in Hi-C data sets for all time points using the directionality index (Dixon et al, 2012) (Fig 2A, Appendix Fig S4, see Appendix Supplementary Analyses for details). For comparison, we also analysed a published Hi-C data set from a different mouse ESC line (ESC-J1; Dixon et al, 2012). Average TAD size was ~0.5 Mb across all cell types (Appendix Fig S5), consistent with recent reports (Phillips-Cremins et al, 2013; Pope et al, 2014; Rao et al, 2014). The location of TAD boundaries measured in our ESC-46C Hi-C data set and in the published ESC-J1 data set overlap by 83% (Appendix Fig S6), in the same range as the overlap typically reported between biological replicates (Dixon et al, 2012). Figure 2. Chromosomes are organized in a hierarchy of higher-order domains (metaTADs) ESC Hi-C map of chromosome 2, 53–58 Mb. The directionality index (DI, bottom) was used to identify TADs, numbered 1–6. metaTAD identification by single-linkage clustering. Examples of TADs (1–6) and metaTADs (I–V) in the same region shown in (A). metaTADs are domains with enriched Hi-C contacts. The ratio of average interaction, I, between pairs of TADs or metaTADs, and background value, IC, was calculated for ESC, NPC and Neurons, as a function of the total number of TADs (n) included in the metaTAD. I/IC remains 20% above control levels in randomized Hi-C matrices up to scales of the order of n = 80 TADs. metaTADs size, d, is represented as a function of the number of TADs that they contain, n, showing that eighty TADs correspond to an average genomic length of around 40 Mb. The metaTAD tree organization in ESC versus Neurons largely coincides with stretches of compartment A (grey) or B (black). A/B compartments are represented in the two central bars and were defined based on an individual principle component (green line) derived from Hi-C data. The yellow line indicates a value of 0. Boundaries of metaTADs larger than 10 Mb are more enriched for transitions between lamina-associated (blue) and lamina-detached (red) regions than TAD boundaries. Heatmaps display the 900-kb flanking domain boundaries (dashed lines) for metaTADs (left heatmap) and TADs (right heatmap) for all boundary regions (heatmap rows). Transitions in lamina association are visible as abrupt changes in heatmap colours at boundaries (see Appendix Supplementary Methods). Both metaTAD and TAD boundaries are significantly more frequently observed to coincide with transitions than expected (P < 1 × 10−4; see Appendix Supplementary Methods). The metaTAD tree of chromosome 19 in ESC (left: full; right: zoomed region). Interactions between metaTADs are not homogeneous, but instead occur through specific contacts involving specific TADs. Hi-C interaction data are plotted in log scale. Download figure Download PowerPoint Although most chromatin contacts observed in Hi-C matrices are found within TADs, interaction signal is also detected locally between specific TADs (Fig 2A; Dixon et al, 2015) and extends to large genomic distances (Fig 1D). We explored higher-order contacts between TADs using Hi-C interaction matrices and found that the most frequently interacting partner of a given TAD is a flanking nearest neighbour TAD in 97% of cases. This behaviour points to a scenario where chromatin folds into larger domains containing multiple, preferentially interacting TADs. To uncover the higher-order domain structure of chromosomes within Hi-C matrices, we implemented a single-linkage clustering procedure of Hi-C contacts. For each chromosome, we iteratively select the two most frequently interacting neighbouring TADs (Fig 2B) and merge them into a higher-order domain or "metaTAD". This metaTAD is then added back to the list of domains, and the procedure is repeated until the entire chromosome arm is contained in a single domain that encompasses the hierarchy of all intervening lower-level metaTADs. The hierarchy of TAD–TAD contacts can be intuitively represented as a tree, where the joined domains are named sequentially as metaTAD-I, -II, -III and so on based on their position along the tree (detailed below). Overlaying the metaTAD structure onto Hi-C data (Fig 2C) provides a visual confirmation that the hierarchy of higher-order domains derived by single-linkage clustering matches the patterns of Hi-C data. It identifies the most frequent inter-TAD contacts in the first levels of the tree (I, II and III) and progressively lower Hi-C frequencies at the higher tree levels (IV and V). To quantify the statistical reliability of the identified metaTADs, we examined several measures of inter-TAD contacts across all data sets. First, we tested whether interactions between metaTAD pairs are significantly more frequent than background interactions. We measured the average interaction level, I, between pairs of domains that produce new metaTADs containing a total of n TADs. As background reference, we used the average interaction (IC) between regions of the same genomic size, but randomly placed at the boundary of any other neighbouring TADs. In ESC, NPC and Neurons, the normalized interaction ratio I/IC remains significantly above control levels (measured in randomized Hi-C matrices), up to metaTADs containing several tens of TADs (Fig 2D). To give a sense of scale, we also plot I at increasing tree levels to show the extent of Hi-C interactions between metaTADs (Fig EV2). I/IC remains 20% above the average values observed for randomized Hi-C, up to metaTADs containing ~80 TADs (Fig 2D, Appendix Fig S7A–D) corresponding to genomic lengths of ~40 Mb (Fig 2E). We also found that the normalized chromatin interactions detected within whole metaTADs, J/JC, remain above background levels up to roughly the same length scale (Appendix Fig S7E–H; Fig EV2). As an additional control, we corrected the Hi-C data for 1D proximity effects (Appendix Supplementary Methods) and found the same most interacting TAD partners in 72% of cases, demonstrating that the observed metaTAD hierarchy is not only a consequence of linear distance. These analyses show that chromosomes adopt hierarchical structural conformations of increasing complexity of metaTADs in ESC, NPC and Neurons, with prominent intra-TAD and inter-TAD contacts. These findings were fully confirmed using the original data sets of TADs identified in mouse ESC-J1 and in human IMR90 and ESC-H1 Hi-C data (Dixon et al, 2012) (Appendix Fig S8, Appendix Supplementary Methods). Click here to expand this figure. Figure EV2. Frequency of TAD–TAD interactions at each tree levelThe average metaTAD inter-domain, I (blue curves), and intra-domain, J (magenta), interactions are shown as a function of the number of TADs, n, which a metaTAD includes in our three cell types (see also Appendix Supplementary Methods). The larger the domains considered, the lower their average interactions, consistently with Hi-C results. As expected, intra-domain interactions, J, stay always above inter-domain interactions, I. Download figure Download PowerPoint Taken together, our results show that a hierarchical architecture of domains-within-domains is a general feature of chromatin folding, found across all stages of differentiation examined and in both murine and human cells. The metaTAD contact hierarchy bridges chromatin organization between TADs and nuclear compartments To visualize the organization of chromatin in metaTADs at a genomic scale, we built a tree diagram for each chromosome, where the tree "leaf" nodes represent TADs and the internal nodes correspond to metaTADs (Fig 2B). Comparison of this type of diagram shows a visual correlation of the tree structures with A/B compartment domains, as tree sub-branches often coincide with transitions between compartments (Fig 2F). To test how metaTADs compare with compartments A/B, we measured the frequency with which two TADs in a common metaTAD are present in the same compartment (Appendix Fig S9). We found that TADs within a metaTAD frequently belong to the same compartment, in particular in the lower tree levels, as expected. Furthermore, this frequency is much higher than what is observed considering the linear distance between TADs, suggesting that the preferential contacts captured in the metaTAD hierarchy reflect preferential contacts within the same compartment type. As an additional comparison between the metaTAD hierarchy of contacts and a second well-known (and independently measured) feature of chromatin organization, we studied the relationship b
DOI: 10.1073/pnas.0810777105
2008
Cited 302 times
Genome-wide analysis of cancer/testis gene expression
Cancer/Testis (CT) genes, normally expressed in germ line cells but also activated in a wide range of cancer types, often encode antigens that are immunogenic in cancer patients, and present potential for use as biomarkers and targets for immunotherapy. Using multiple in silico gene expression analysis technologies, including twice the number of expressed sequence tags used in previous studies, we have performed a comprehensive genome-wide survey of expression for a set of 153 previously described CT genes in normal and cancer expression libraries. We find that although they are generally highly expressed in testis, these genes exhibit heterogeneous gene expression profiles, allowing their classification into testis-restricted (39), testis/brain-restricted (14), and a testis-selective (85) group of genes that show additional expression in somatic tissues. The chromosomal distribution of these genes confirmed the previously observed dominance of X chromosome location, with CT-X genes being significantly more testis-restricted than non-X CT. Applying this core classification in a genome-wide survey we identified >30 CT candidate genes; 3 of them, PEPP-2, OTOA, and AKAP4, were confirmed as testis-restricted or testis-selective using RT-PCR, with variable expression frequencies observed in a panel of cancer cell lines. Our classification provides an objective ranking for potential CT genes, which is useful in guiding further identification and characterization of these potentially important diagnostic and therapeutic targets.
DOI: 10.1371/journal.pgen.0020047
2006
Cited 289 times
Complex Loci in Human and Mouse Genomes
Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis–antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis–antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis–antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis–antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis–antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis–antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.
DOI: 10.1073/pnas.1110156109
2012
Cited 269 times
Conservation and divergence in Toll-like receptor 4-regulated gene expression in primary human versus mouse macrophages
Evolutionary change in gene expression is generally considered to be a major driver of phenotypic differences between species. We investigated innate immune diversification by analyzing interspecies differences in the transcriptional responses of primary human and mouse macrophages to the Toll-like receptor (TLR)-4 agonist lipopolysaccharide (LPS). By using a custom platform permitting cross-species interrogation coupled with deep sequencing of mRNA 5' ends, we identified extensive divergence in LPS-regulated orthologous gene expression between humans and mice (24% of orthologues were identified as "divergently regulated"). We further demonstrate concordant regulation of human-specific LPS target genes in primary pig macrophages. Divergently regulated orthologues were enriched for genes encoding cellular "inputs" such as cell surface receptors (e.g., TLR6, IL-7Rα) and functional "outputs" such as inflammatory cytokines/chemokines (e.g., CCL20, CXCL13). Conversely, intracellular signaling components linking inputs to outputs were typically concordantly regulated. Functional consequences of divergent gene regulation were confirmed by showing LPS pretreatment boosts subsequent TLR6 responses in mouse but not human macrophages, in keeping with mouse-specific TLR6 induction. Divergently regulated genes were associated with a large dynamic range of gene expression, and specific promoter architectural features (TATA box enrichment, CpG island depletion). Surprisingly, regulatory divergence was also associated with enhanced interspecies promoter conservation. Thus, the genes controlled by complex, highly conserved promoters that facilitate dynamic regulation are also the most susceptible to evolutionary change.
DOI: 10.1186/1471-2164-9-157
2008
Cited 268 times
Hidden layers of human small RNAs
Abstract Background Small RNA attracts increasing interest based on the discovery of RNA silencing and the rapid progress of our understanding of these phenomena. Although recent studies suggest the possible existence of yet undiscovered types of small RNAs in higher organisms, many studies to profile small RNA have focused on miRNA and/or siRNA rather than on the exploration of additional classes of RNAs. Results Here, we explored human small RNAs by unbiased sequencing of RNAs with sizes of 19–40 nt. We provide substantial evidences for the existence of independent classes of small RNAs. Our data shows that well-characterized non-coding RNA, such as tRNA, snoRNA, and snRNA are cleaved at sites specific to the class of ncRNA. In particular, tRNA cleavage is regulated depending on tRNA type and tissue expression. We also found small RNAs mapped to genomic regions that are transcribed in both directions by bidirectional promoters, indicating that the small RNAs are a product of dsRNA formation and their subsequent cleavage. Their partial similarity with ribosomal RNAs (rRNAs) suggests unrevealed functions of ribosomal DNA or interstitial rRNA. Further examination revealed six novel miRNAs. Conclusion Our results underscore the complexity of the small RNA world and the biogenesis of small RNAs.
DOI: 10.1038/ng.3487
2016
Cited 268 times
A predictive computational framework for direct reprogramming between human cell types
Transdifferentiation, the process of converting from one cell type to another without going through a pluripotent state, has great promise for regenerative medicine. The identification of key transcription factors for reprogramming is currently limited by the cost of exhaustive experimental testing of plausible sets of factors, an approach that is inefficient and unscalable. Here we present a predictive system (Mogrify) that combines gene expression data with regulatory network information to predict the reprogramming factors necessary to induce cell conversion. We have applied Mogrify to 173 human cell types and 134 tissues, defining an atlas of cellular reprogramming. Mogrify correctly predicts the transcription factors used in known transdifferentiations. Furthermore, we validated two new transdifferentiations predicted by Mogrify. We provide a practical and efficient mechanism for systematically implementing novel cell conversions, facilitating the generalization of reprogramming of human cells. Predictions are made available to help rapidly further the field of cell conversion.
DOI: 10.4161/rna.8.1.14300
2011
Cited 267 times
Deep-sequencing of human Argonaute-associated small RNAs provides insight into miRNA sorting and reveals Argonaute association with RNA fragments of diverse origin
While several studies have focused on the relationship between individual miRNA loci or classes of small RNA with human Argonaute (AGO) proteins, a comprehensive, global analysis of the RNA content associating with different AGO proteins has yet to be performed. We have compared the content of deep sequenced RNA extracted from immunoprecipitation experiments with the AGO1, AGO2, and AGO3 proteins. Consistent with previous observations, sequence tags derived from miRNA loci globally associate in approximately equivalent amounts with AGO1, AGO2, and AGO3. Exceptions include miR-182, miR-222, and miR-223*, which could be coupled to processes targeting the loci for interaction with specific AGO proteins. A closer inspection of the data, however, supports the presence of an unusual sorting mechanism wherein a subset of miRNA loci give rise to distinct isomirs which preferentially associate with distinct AGO proteins in a significantly differential manner. We also identify the complete set of short RNA derived from non-miRNA sources including tRNA, snRNA, snoRNA, vRNA, and mRNA associating with the AGO proteins, many of which are predicted to play roles in post-transcriptional gene silencing. We also observe enrichment of tags mapping to promoter regions of genes, suggesting that a fraction of the recently-identified promoter-associated small RNAs in humans could function through interaction with AGO proteins. Finally, we observe antisense miRNA transcripts are frequently present in low copy numbers across a range of diverse miRNA loci and these transcripts appear to associate with AGO proteins.
DOI: 10.2337/db11-1508
2012
Cited 264 times
Adipose Tissue MicroRNAs as Regulators of CCL2 Production in Human Obesity
In obesity, white adipose tissue (WAT) inflammation is linked to insulin resistance. Increased adipocyte chemokine (C-C motif) ligand 2 (CCL2) secretion may initiate adipose inflammation by attracting the migration of inflammatory cells into the tissue. Using an unbiased approach, we identified adipose microRNAs (miRNAs) that are dysregulated in human obesity and assessed their possible role in controlling CCL2 production. In subcutaneous WAT obtained from 56 subjects, 11 miRNAs were present in all subjects and downregulated in obesity. Of these, 10 affected adipocyte CCL2 secretion in vitro and for 2 miRNAs (miR-126 and miR-193b), regulatory circuits were defined. While miR-126 bound directly to the 3'-untranslated region of CCL2 mRNA, miR-193b regulated CCL2 production indirectly through a network of transcription factors, many of which have been identified in other inflammatory conditions. In addition, overexpression of miR-193b and miR-126 in a human monocyte/macrophage cell line attenuated CCL2 production. The levels of the two miRNAs in subcutaneous WAT were significantly associated with CCL2 secretion (miR-193b) and expression of integrin, α-X, an inflammatory macrophage marker (miR-193b and miR-126). Taken together, our data suggest that miRNAs may be important regulators of adipose inflammation through their effects on CCL2 release from human adipocytes and macrophages.
DOI: 10.1038/emboj.2013.99
2013
Cited 240 times
Androgen-responsive long noncoding RNA CTBP1-AS promotes prostate cancer
High-throughput techniques have identified numerous antisense (AS) transcripts and long non-coding RNAs (ncRNAs). However, their significance in cancer biology remains largely unknown. Here, we report an androgen-responsive long ncRNA, CTBP1-AS, located in the AS region of C-terminal binding protein 1 (CTBP1), which is a corepressor for androgen receptor. CTBP1-AS is predominantly localized in the nucleus and its expression is generally upregulated in prostate cancer. CTBP1-AS promotes both hormone-dependent and castration-resistant tumour growth. Mechanistically, CTBP1-AS directly represses CTBP1 expression by recruiting the RNA-binding transcriptional repressor PSF together with histone deacetylases. CTBP1-AS also exhibits global androgen-dependent functions by inhibiting tumour-suppressor genes via the PSF-dependent mechanism thus promoting cell cycle progression. Our findings provide new insights into the functions of ncRNAs that directly contribute to prostate cancer progression.
DOI: 10.1371/journal.pone.0004219
2009
Cited 232 times
Direct Metagenomic Detection of Viral Pathogens in Nasal and Fecal Specimens Using an Unbiased High-Throughput Sequencing Approach
With the severe acute respiratory syndrome epidemic of 2003 and renewed attention on avian influenza viral pandemics, new surveillance systems are needed for the earlier detection of emerging infectious diseases. We applied a "next-generation" parallel sequencing platform for viral detection in nasopharyngeal and fecal samples collected during seasonal influenza virus (Flu) infections and norovirus outbreaks from 2005 to 2007 in Osaka, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.1-0.25 ml of nasopharyngeal aspirates (N = 3) and fecal specimens (N = 5), and more than 10 microg of cDNA was synthesized. Unbiased high-throughput sequencing of these 8 samples yielded 15,298-32,335 (average 24,738) reads in a single 7.5 h run. In nasopharyngeal samples, although whole genome analysis was not available because the majority (>90%) of reads were host genome-derived, 20-460 Flu-reads were detected, which was sufficient for subtype identification. In fecal samples, bacteria and host cells were removed by centrifugation, resulting in gain of 484-15,260 reads of norovirus sequence (78-98% of the whole genome was covered), except for one specimen that was under-detectable by RT-PCR. These results suggest that our unbiased high-throughput sequencing approach is useful for directly detecting pathogenic viruses without advance genetic information. Although its cost and technological availability make it unlikely that this system will very soon be the diagnostic standard worldwide, this system could be useful for the earlier discovery of novel emerging viruses and bioterrorism, which are difficult to detect with conventional procedures.
DOI: 10.1007/s12015-016-9680-6
2016
Cited 205 times
Genomic Instability of iPSCs: Challenges Towards Their Clinical Applications
Induced pluripotent stem cells (iPSCs) are a type of pluripotent stem cells generated directly from mature cells through the introduction of key transcription factors. iPSCs can be propagated and differentiated into many cell types in the human body, holding enormous potential in the field of regenerative medicine. However, genomic instability of iPSCs has been reported with the advent of high-throughput technologies such as next-generation sequencing. The presence of genetic variations in iPSCs has raised serious safety concerns, hampering the advancement of iPSC-based novel therapies. Here we summarize our current knowledge on genomic instability of iPSCs, with a particular focus on types of genetic variations and their origins. Importantly, it remains elusive whether genetic variations in iPSCs can be an actual risk factor for adverse effects including malignant outgrowth. Furthermore, we discuss novel approaches to generate iPSCs with fewer genetic variations. Lastly, we outline the safety issues and monitoring strategies of iPSCs in clinical settings.
DOI: 10.1093/nar/gkv054
2015
Cited 193 times
CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses
Cap analysis of gene expression (CAGE) is a high-throughput method for transcriptome analysis that provides a single base-pair resolution map of transcription start sites (TSS) and their relative usage. Despite their high resolution and functional significance, published CAGE data are still underused in promoter analysis due to the absence of tools that enable its efficient manipulation and integration with other genome data types. Here we present CAGEr, an R implementation of novel methods for the analysis of differential TSS usage and promoter dynamics, integrated with CAGE data processing and promoterome mining into a first comprehensive CAGE toolbox on a common analysis platform. Crucially, we provide collections of TSSs derived from most published CAGE datasets, as well as direct access to FANTOM5 resource of TSSs for numerous human and mouse cell/tissue types from within R, greatly increasing the accessibility of precise context-specific TSS data for integrative analyses. The CAGEr package is freely available from Bioconductor at http://www.bioconductor.org/packages/release/bioc/html/CAGEr.html.
DOI: 10.1182/blood-2013-02-483792
2014
Cited 177 times
Redefinition of the human mast cell transcriptome by deep-CAGE sequencing
Mast cells (MCs) mature exclusively in peripheral tissues, hampering research into their developmental and functional programs. Here, we employed deep cap analysis of gene expression on skin-derived MCs to generate the most comprehensive view of the human MC transcriptome ever reported. An advantage is that MCs were embedded in the FANTOM5 project, giving the opportunity to contrast their molecular signature against a multitude of human samples. We demonstrate that MCs possess a unique and surprising transcriptional landscape, combining hematopoietic genes with those exclusively active in MCs and genes not previously reported as expressed by MCs (several of them markers of unrelated tissues). We also found functional bone morphogenetic protein receptors transducing activatory signals in MCs. Conversely, several immune-related genes frequently studied in MCs were not expressed or were weakly expressed. Comparing MCs ex vivo with cultured counterparts revealed profound changes in the MC transcriptome in in vitro surroundings. We also determined the promoter usage of MC-expressed genes and identified associated motifs active in the lineage. Befitting their uniqueness, MCs had no close relative in the hematopoietic network (also only distantly related with basophils). This rich data set reveals that our knowledge of human MCs is still limited, but with this resource, novel functional programs of MCs may soon be discovered.
DOI: 10.1093/nar/gky1099
2018
Cited 175 times
Update of the FANTOM web resource: expansion to provide additional transcriptome atlases
The FANTOM web resource (http://fantom.gsc.riken.jp/) was developed to provide easy access to the data produced by the FANTOM project. It contains the most complete and comprehensive sets of actively transcribed enhancers and promoters in the human and mouse genomes. We determined the transcription activities of these regulatory elements by CAGE (Cap Analysis of Gene Expression) for both steady and dynamic cellular states in all major and some rare cell types, consecutive stages of differentiation and responses to stimuli. We have expanded the resource by employing different assays, such as RNA-seq, short RNA-seq and a paired-end protocol for CAGE (CAGEscan), to provide new angles to study the transcriptome. That yielded additional atlases of long noncoding RNAs, miRNAs and their promoters. We have also expanded the CAGE analysis to cover rat, dog, chicken, and macaque species for a limited number of cell types. The CAGE data obtained from human and mouse were reprocessed to make them available on the latest genome assemblies. Here, we report the recent updates of both data and interfaces in the FANTOM web resource.
DOI: 10.1007/978-1-4939-0805-9_7
2014
Cited 174 times
Detecting Expressed Genes Using CAGE
Cap analysis of gene expression (CAGE) provides accurate high-throughput measurement of RNA expression. By the large-scale analysis of 5′ end of transcripts using CAGE method, it enables not only determination of the transcription start site but also prediction of promoter region. Here we provide a protocol for the construction of no-amplification non-tagging CAGE libraries for Illumina next-generation sequencers (nAnT-iCAGE). We have excluded the commonly used PCR amplification and cleavage of restriction enzyme to eliminate any potential biases. As a result, we achieved less biased simple preparation process.
DOI: 10.1016/j.stem.2018.01.010
2018
Cited 148 times
CD157 Marks Tissue-Resident Endothelial Stem Cells with Homeostatic and Regenerative Properties
<h2>Summary</h2> The generation of new blood vessels via angiogenesis is critical for meeting tissue oxygen demands. A role for adult stem cells in this process remains unclear. Here, we identified CD157 (bst1, bone marrow stromal antigen 1) as a marker of tissue-resident vascular endothelial stem cells (VESCs) in large arteries and veins of numerous mouse organs. Single CD157<sup>+</sup> VESCs form colonies <i>in vitro</i> and generate donor-derived portal vein, sinusoids, and central vein endothelial cells upon transplantation in the liver. In response to injury, VESCs expand and regenerate entire vasculature structures, supporting the existence of an endothelial hierarchy within blood vessels. Genetic lineage tracing revealed that VESCs maintain large vessels and sinusoids in the normal liver for more than a year, and transplantation of VESCs rescued bleeding phenotypes in a mouse model of hemophilia. Our findings show that tissue-resident VESCs display self-renewal capacity and that vascular regeneration potential exists in peripheral blood vessels.
DOI: 10.1006/geno.1996.0567
1996
Cited 283 times
High-Efficiency Full-Length cDNA Cloning by Biotinylated CAP Trapper
We have devised a method for efficiently constructing high-content full-length cDNA libraries based on chemical introduction of a biotin group into the diol residue of the cap structure of eukaryotic mRNA, followed by RNase I treatment to select full-length cDNA. The selection occurs by trapping the biotin residue at the cap sites using streptavidin-coated magnetic beads, thus eliminating incompletely synthesized cDNAs. When this method was used to construct a mouse brain full-length cDNA library, our evaluation showed that more than 95% of the total clones were of full length, and recombinant clones could be produced with high efficiency (1.2 × 107/10 μg starting mRNA). The analysis of 120 randomly picked clones indicates an unbiased representation of the starting mRNA population.
DOI: 10.1101/gr.145100
2000
Cited 277 times
Normalization and Subtraction of Cap-Trapper-Selected cDNAs to Prepare Full-Length cDNA Libraries for Rapid Discovery of New Genes
In the effort to prepare the mouse full-length cDNA encyclopedia, we previously developed several techniques to prepare and select full-length cDNAs. To increase the number of different cDNAs, we introduce here a strategy to prepare normalized and subtracted cDNA libraries in a single step. The method is based on hybridization of the first-strand, full-length cDNA with several RNA drivers, including starting mRNA as the normalizing driver and run-off transcripts from minilibraries containing highly expressed genes, rearrayed clones, and previously sequenced cDNAs as subtracting drivers. Our method keeps the proportion of full-length cDNAs in the subtracted/normalized library high. Moreover, our method dramatically enhances the discovery of new genes as compared to results obtained by using standard, full-length cDNA libraries. This procedure can be extended to the preparation of full-length cDNA encyclopedias from other organisms.
DOI: 10.1073/pnas.95.17.10038
1998
Cited 255 times
The human <i>GNAS1</i> gene is imprinted and encodes distinct paternally and biallelically expressed G proteins
The GNAS1 gene encodes the α subunit of the G protein G s , which couples receptor binding by several hormones to activation of adenylate cyclase. Null mutations of GNAS1 cause pseudohypoparathyroidism (PHP) type Ia, in which hormone resistance occurs in association with a characteristic osteodystrophy. The observation that PHP Ia almost always is inherited maternally has led to the suggestion that GNAS1 may be an imprinted gene. Here, we show that, although G s α expression (directed by the promoter upstream of exon 1) is biallelic, GNAS1 is indeed imprinted in a promoter-specific fashion. We used parthenogenetic lymphocyte DNA to screen by restriction landmark genomic scanning for loci showing differential methylation between paternal and maternal alleles. This screen identified a region that was found to be methylated exclusively on a maternal allele and was located ≈35 kb upstream of GNAS1 exon 1. This region contains three novel exons that are spliced into alternative GNAS1 mRNA species, including one exon that encodes the human homologue of the large G protein XLαs. Transcription of these novel mRNAs is exclusively from the paternal allele in all tissues examined. The differential imprinting of separate protein products of GNAS1 therefore may contribute to the anomalous inheritance of PHP Ia.
DOI: 10.1101/gr.6831208
2007
Cited 248 times
A code for transcription initiation in mammalian genomes
Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling.
DOI: 10.1101/gr.982903
2003
Cited 241 times
Antisense Transcripts With FANTOM2 Clone Set and Their Implications for Gene Regulation
We have used the FANTOM2 mouse cDNA set (60,770 clones), public mRNA data, and mouse genome sequence data to identify 2481 pairs of sense-antisense transcripts and 899 further pairs of nonantisense bidirectional transcription based upon genomic mapping. The analysis greatly expands the number of known examples of sense-antisense transcript and nonantisense bidirectional transcription pairs in mammals. The FANTOM2 cDNA set appears to contain substantially large numbers of noncoding transcripts suitable for antisense transcript analysis. The average proportion of loci encoding sense-antisense transcript and nonantisense bidirectional transcription pairs on autosomes was 15.1 and 5.4%, respectively. Those on the X chromosome were 6.3 and 4.2%, respectively. Sense-antisense transcript pairs, rather than nonantisense bidirectional transcription pairs, may be less prevalent on the X chromosome, possibly due to X chromosome inactivation. Sense and antisense transcripts tended to be isolated from the same libraries, where nonantisense bidirectional transcription pairs were not apparently coregulated. The existence of large numbers of natural antisense transcripts implies that the regulation of gene expression by antisense transcripts is more common that previously recognized. The viewer showing mapping patterns of sense-antisense transcript pairs and nonantisense bidirectional transcription pairs on the genome and other related statistical data is available on our Web site.
DOI: 10.1038/leu.2009.246
2009
Cited 227 times
Induction of microRNAs, mir-155, mir-222, mir-424 and mir-503, promotes monocytic differentiation through combinatorial regulation
Acute myeloid leukemia (AML) involves a block in terminal differentiation of the myeloid lineage and uncontrolled proliferation of a progenitor state. Using phorbol myristate acetate (PMA), it is possible to overcome this block in THP-1 cells (an M5-AML containing the MLL-MLLT3 fusion), resulting in differentiation to an adherent monocytic phenotype. As part of FANTOM4, we used microarrays to identify 23 microRNAs that are regulated by PMA. We identify four PMA-induced microRNAs (mir-155, mir-222, mir-424 and mir-503) that when overexpressed cause cell-cycle arrest and partial differentiation and when used in combination induce additional changes not seen by any individual microRNA. We further characterize these pro-differentiative microRNAs and show that mir-155 and mir-222 induce G2 arrest and apoptosis, respectively. We find mir-424 and mir-503 are derived from a polycistronic precursor mir-424-503 that is under repression by the MLL-MLLT3 leukemogenic fusion. Both of these microRNAs directly target cell-cycle regulators and induce G1 cell-cycle arrest when overexpressed in THP-1. We also find that the pro-differentiative mir-424 and mir-503 downregulate the anti-differentiative mir-9 by targeting a site in its primary transcript. Our study highlights the combinatorial effects of multiple microRNAs within cellular systems.
DOI: 10.1073/pnas.95.2.520
1998
Cited 223 times
Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA
The advent of thermostable enzymes has led to great advances in molecular biology, such as the development of PCR and ligase chain reaction. However, isolation of naturally thermostable enzymes has been restricted to those existing in thermophylic bacteria. Here, we show that the disaccharide trehalose enables enzymes to maintain their normal activity (thermostabilization) or even to increase activity at high temperatures (thermoactivation) at which they are normally inactive. We also demonstrate how enzyme thermoactivation can improve the reverse transcriptase reaction. In fact, thermoactivated reverse transcriptase, which displays full activity even at 60°C, was powerful enough to synthesize full length cDNA without the early termination usually induced by stable secondary structures of mRNA.
DOI: 10.1093/bioinformatics/btp527
2009
Cited 220 times
TagDust—a program to eliminate artifacts from next generation sequencing data
Next-generation parallel sequencing technologies produce large quantities of short sequence reads. Due to experimental procedures various types of artifacts are commonly sequenced alongside the targeted RNA or DNA sequences. Identification of such artifacts is important during the development of novel sequencing assays and for the downstream analysis of the sequenced libraries.Here we present TagDust, a program identifying artifactual sequences in large sequencing runs. Given a user-defined cutoff for the false discovery rate, TagDust identifies all reads explainable by combinations and partial matches to known sequences used during library preparation. We demonstrate the quality of our method on sequencing runs performed on Illumina's Genome Analyzer platform.Executables and documentation are available from http://genome.gsc.riken.jp/osc/english/software/.timolassmann@gmail.com.
DOI: 10.1073/pnas.82.7.1931
1985
Cited 218 times
Molecular cloning of the human cholecystokinin gene by use of a synthetic probe containing deoxyinosine.
A synthetic DNA based on the known amino acid sequence of the brain/gut peptide cholecystokinin (CCK) was synthesized. This DNA contained deoxyinosines at ambiguous codon positions and was used as a probe to isolate the CCK gene directly from a human genomic library. Nucleotide sequence analysis of the isolated gene revealed that human preprocholecystokinin consists of 115 amino acid residues, with 11 amino acids in common with the human gastrin precursor, another member of the gastrin-CCK family, and that the coding region is separated by a single, long intron. CCK appears to be encoded by a single-copy gene in the haploid human genome, as revealed by genomic Southern hybridization analysis, suggesting that the same gene is expressed both in gut and brain.
DOI: 10.1046/j.1365-313x.1998.00237.x
1998
Cited 215 times
<b>High‐efficiency cloning of </b><i><b>Arabidopsis</b></i><b> full‐length cDNA by biotinylated CAP trapper</b>
Summary Full‐length cDNAs are essential for functional analysis of plant genes. We constructed high‐content, full‐length cDNA libraries from Arabidopsis thaliana plants based on chemical introduction of a biotin group into the diol residue of the CAP structure of eukaryotic mRNA, followed by RNase I treatment, to select full‐length cDNA. More than 90% of the total clones obtained were of full length; recombinant clones were obtained with high efficiency (2.2 × 10 6 /9 μg starting mRNA). Sequence analysis of 111 randomly picked clones indicated that 32 isolated cDNA groups were derived from novel genes in the A. thaliana genome.
DOI: 10.1073/pnas.0409034102
2005
Cited 213 times
Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, <i>Buchnera</i>
Aphids possess bacteriocytes, cells specifically differentiated to harbor obligatory mutualistic bacteria of the genus Buchnera , which have lost many genes that are essential for common bacterial functions. To understand the host's role in maintaining the symbiotic relationship, bacteriocytes were isolated from the pea aphid, Acyrthosiphon pisum , and the host transcriptome was investigated by using EST analysis and real-time quantitative RT-PCR. A number of genes were highly expressed specifically in the bacteriocyte, including ( i ) genes for amino acid metabolism, including those for biosynthesis of amino acids that Buchnera cannot produce, and those for utilization of amino acids that Buchnera can synthesize; ( ii ) genes related to transport, including genes for mitochondrial transporters and a gene encoding Rab, a G protein that regulates vesicular transport; and ( iii ) genes for putative lysozymes that degrade bacterial cell walls. Significant up-regulation of i clearly indicated that the bacteriocyte is involved in the exchange of amino acids between the host aphid and Buchnera , the key metabolic process in the symbiotic system. Conspicuously high expression of ii and iii shed light on previously unknown aspects of the host– Buchnera interactions in the symbiotic system.
DOI: 10.1038/ng0194-33
1994
Cited 211 times
Identification of an imprinted U2af binding protein related sequence on mouse chromosome 11 using the RLGS method
DOI: 10.1096/fj.05-5360com
2006
Cited 207 times
LPS regulates proinflammatory gene expression in macrophages by altering histone deacetylase expression
Bacterial LPS triggers dramatic changes in gene expression in macrophages. We show here that LPS regulated several members of the histone deacetylase (HDAC) family at the mRNA level in murine bone marrow-derived macrophages (BMM). LPS transiently repressed, then induced a number of HDACs (Hdac-4, 5, 7) in BMM, whereas Hdac-1 mRNA was induced more rapidly. Treatment of BMM with trichostatin A (TSA), an inhibitor of HDACs, enhanced LPS-induced expression of the Cox-2, Cxcl2, and Ifit2 genes. In the case of Cox-2, this effect was also apparent at the promoter level. Overexpression of Hdac-8 in RAW264 murine macrophages blocked the ability of LPS to induce Cox-2 mRNA. Another class of LPS-inducible genes, which included Ccl2, Ccl7, and Edn1, was suppressed by TSA, an effect most likely mediated by PU.1 degradation. Hence, HDACs act as potent and selective negative regulators of proinflammatory gene expression and act to prevent excessive inflammatory responses in macrophages.
DOI: 10.1073/pnas.041605498
2001
Cited 201 times
Delineating developmental and metabolic pathways<i>in vivo</i>by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays
We have systematically characterized gene expression patterns in 49 adult and embryonic mouse tissues by using cDNA microarrays with 18,816 mouse cDNAs. Cluster analysis defined sets of genes that were expressed ubiquitously or in similar groups of tissues such as digestive organs and muscle. Clustering of expression profiles was observed in embryonic brain, postnatal cerebellum, and adult olfactory bulb, reflecting similarities in neurogenesis and remodeling. Finally, clustering genes coding for known enzymes into 78 metabolic pathways revealed a surprising coordination of expression within each pathway among different tissues. On the other hand, a more detailed examination of glycolysis revealed tissue-specific differences in profiles of key regulatory enzymes. Thus, by surveying global gene expression by using microarrays with a large number of elements, we provide insights into the commonality and diversity of pathways responsible for the development and maintenance of the mammalian body plan.
DOI: 10.1371/journal.pgen.0020052
2006
Cited 198 times
The Abundance of Short Proteins in the Mammalian Proteome
Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a "dark matter" subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.
DOI: 10.1016/0378-1119(87)90087-4
1987
Cited 198 times
Revision of consensus sequence of human Alu repeats — a review
Nucleotide sequences of 50 human Alu repeats and their flanking regions are presented together with the consensus sequence based on the literature and our findings. The results indicate the need for some revisions of the Alu consensus sequence published by Deininger et al. (1981). Most nucleotide substitutions among the Alu members are transitions, rather than transversions. The Alu sequence seems to consist of 'conserved' regions and 'variable' regions. The conserved regions consist of a 25-bp region between nt positions 23 and 47 and a 16-bp region between nt positions 245 and 260. The 16-bp region corresponds to the region of 7SL RNA that is claimed to fold and become paired with the internal promoter sequence. Two A-rich regions, one located at the right end of the first monomer and the other at the right end of the second monomer, are variable. No defined property was found with direct repeats flanking the Alu repeats.
2000
Cited 198 times
Bacterial artificial chromosome libraries for mouse sequencing and functional analysis.
Bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) libraries providing a combined 33-fold representation of the murine genome have been constructed using two different restriction enzymes for genomic digestion. A large-insert PAC library was prepared from the 129S6/SvEvTac strain in a bacterial/mammalian shuttle vector to facilitate functional gene studies. For genome mapping and sequencing, we prepared BAC libraries from the 129S6/SvEvTac and the C57BL/6J strains. The average insert sizes for the three libraries range between 130 kb and 200 kb. Based on the numbers of clones and the observed average insert sizes, we estimate each library to have slightly in excess of 10-fold genome representation. The average number of clones found after hybridization screening with 28 probes was in the range of 9-14 clones per marker. To explore the fidelity of the genomic representation in the three libraries, we analyzed three contigs, each established after screening with a single unique marker. New markers were established from the end sequences and screened against all the contig members to determine if any of the BACs and PACs are chimeric or rearranged. Only one chimeric clone and six potential deletions have been observed after extensive analysis of 113 PAC and BAC clones. Seventy-one of the 113 clones were conclusively nonchimeric because both end markers or sequences were mapped to the other confirmed contig members. We could not exclude chimerism for the remaining 41 clones because one or both of the insert termini did not contain unique sequence to design markers. The low rate of chimerism, approximately 1%, and the low level of detected rearrangements support the anticipated usefulness of the BAC libraries for genome research.
DOI: 10.1046/j.1365-2443.2000.00351.x
2000
Cited 194 times
The paternal methylation imprint of the mouse <i>H19</i> locus is acquired in the gonocyte stage during foetal testis development
Germline-specific differential DNA methylation that persists through fertilization and embryonic development is thought to be the 'imprint' distinguishing the parental alleles of imprinted genes. If such methylation is to work as the imprinting mechanism, however, it has to be reprogrammed following each passage through the germline. Previous studies on maternally methylated genes have shown that their methylation imprints are first erased in primordial germ cells (PGCs) and then re-established during oocyte growth.We have examined the timing of the reprogramming of the paternal methylation imprint of the mouse H19 gene during germ cell development. In both male and female PGCs, the paternal allele is partially methylated whereas the maternal allele is unmethylated. This partial methylation is completely erased in the female germline by entry into meiosis, establishing the oocyte methylation pattern. In the male germline, both alleles become methylated, mainly during the gonocyte stage, establishing the sperm methylation pattern.The paternal methylation imprint of H19 is established in the male germline and erased in the female germline at specific developmental stages. The identification of the timings of the methylation and demethylation should help to identify and characterize the biochemical basis of the reprogramming of imprinting.
DOI: 10.1101/gr.269102
2002
Cited 192 times
The <i>Drosophila</i> Gene Collection: Identification of Putative Full-Length cDNAs for 70% of <i>D. melanogaster</i> Genes
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5' expressed sequence tags (5' expressed sequence tags [ESTs]from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to ~40% of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remaining genes, we have generated an additional 157,835 5' ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22-h embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70% of the predicted genes in Drosophila.
DOI: 10.1074/jbc.271.2.1043
1996
Cited 185 times
The Mouse Zic Gene Family
The mouse Zic gene, which encodes a zinc finger protein, is expressed in the developing or matured central nervous system in a highly restricted manner. We identified two novel Zic-related genes (Zic2, Zic3) through genomic and cDNA cloning. Both genes are highly similar to Zic(1), especially in their zinc finger motif. A comparison of genomic organization among the three Zic genes showed that they share common exon-intron boundaries and belong to the same gene family. Zic1, Zic2, and Zic3 were determined to mouse chromosome 9, 14, and X using an interspecific backcross panel. Northern blotting and ribonuclease protection showed that Zic2 and Zic3 are expressed in a restricted manner in the cerebellum at the adult stage. However, the temporal profile of the mRNA expression in the developing cerebella differ in the three Zic genes. Furthermore, we found that the Drosophila pair-rule gene, odd-paired is highly homologous to the Zic gene family. The similarity was not only the zinc finger motif, but also the exon-intron boundary was the same as those of mouse Zic gene family. These findings suggest that the Zic gene family and Drosophila odd-paired are derived from a common ancestral gene. The mouse Zic gene, which encodes a zinc finger protein, is expressed in the developing or matured central nervous system in a highly restricted manner. We identified two novel Zic-related genes (Zic2, Zic3) through genomic and cDNA cloning. Both genes are highly similar to Zic(1), especially in their zinc finger motif. A comparison of genomic organization among the three Zic genes showed that they share common exon-intron boundaries and belong to the same gene family. Zic1, Zic2, and Zic3 were determined to mouse chromosome 9, 14, and X using an interspecific backcross panel. Northern blotting and ribonuclease protection showed that Zic2 and Zic3 are expressed in a restricted manner in the cerebellum at the adult stage. However, the temporal profile of the mRNA expression in the developing cerebella differ in the three Zic genes. Furthermore, we found that the Drosophila pair-rule gene, odd-paired is highly homologous to the Zic gene family. The similarity was not only the zinc finger motif, but also the exon-intron boundary was the same as those of mouse Zic gene family. These findings suggest that the Zic gene family and Drosophila odd-paired are derived from a common ancestral gene.
DOI: 10.1038/ng0996-106
1996
Cited 183 times
Identification of Grf1 on mouse chromosome 9 as an imprinted gene by RLGS–M
DOI: 10.1101/gr.1017303
2003
Cited 181 times
Impact of Alternative Initiation, Splicing, and Termination on the Diversity of the mRNA Transcripts Encoded by the Mouse Transcriptome
We analyzed the FANTOM2 clone set of 60,770 RIKEN full-length mouse cDNA sequences and 44,122 public mRNA sequences. We developed a new computational procedure to identify and classify the forms of splice variation evident in this data set and organized the results into a publicly accessible database that can be used for future expression array construction, structural genomics, and analyses of the mechanism and regulation of alternative splicing. Statistical analysis shows that at least 41% and possibly as much as 60% of multiexon genes in mouse have multiple splice forms. Of the transcription units with multiple splice forms, 49% contain transcripts in which the apparent use of an alternative transcription start (stop) is accompanied by alternative splicing of the initial (terminal) exon. This implies that alternative transcription may frequently induce alternative splicing. The fact that 73% of all exons with splice variation fall within the annotated coding region indicates that most splice variation is likely to affect the protein form. Finally, we compared the set of constitutive (present in all transcripts) exons with the set of cryptic (present only in some transcripts) exons and found statistically significant differences in their length distributions, the nucleotide distributions around their splice junctions, and the frequencies of occurrence of several short sequence motifs.
DOI: 10.1073/pnas.0607098103
2006
Cited 174 times
A molecular neuroethological approach for identifying and characterizing a cascade of behaviorally regulated genes
Songbirds have one of the most accessible neural systems for the study of brain mechanisms of behavior. However, neuroethological studies in songbirds have been limited by the lack of high-throughput molecular resources and gene-manipulation tools. To overcome these limitations, we constructed 21 regular, normalized, and subtracted full-length cDNA libraries from brains of zebra finches in 57 developmental and behavioral conditions in an attempt to clone as much of the brain transcriptome as possible. From these libraries, approximately 14,000 transcripts were isolated, representing an estimated 4,738 genes. With the cDNAs, we created a hierarchically organized transcriptome database and a large-scale songbird brain cDNA microarray. We used the arrays to reveal a set of 33 genes that are regulated in forebrain vocal nuclei by singing behavior. These genes clustered into four anatomical and six temporal expression patterns. Their functions spanned a large range of cellular and molecular categories, from signal transduction, trafficking, and structural, to synaptically released molecules. With the full-length cDNAs and a lentiviral vector system, we were able to overexpress, in vocal nuclei, proteins of representative singing-regulated genes in the absence of singing. This publicly accessible resource http://songbirdtranscriptome.net can now be used to study molecular neuroethological mechanisms of behavior.
DOI: 10.1016/s0076-6879(99)03004-9
1999
Cited 173 times
[2] High-efficiency full-length cDNA cloning
A full-length cDNA library is advantageous in that it allows cloning of a complete sequence in a single step. However, the representation of full-length cDNA clones has been low in cDNA libraries prepared using standard techniques. In the preparation of full-length cDNA libraries, two problems have arisen. The first is the difficulty in reaching the cap site in the first-strand synthesis with the reverse transcriptase (RT), the first enzyme involved in the preparation of a cDNA library. The other problem in library preparation is that there have not been effective methods for selection of full-length cDNAs from incompletely extended cDNAs. This chapter describes protocols using the powerful “thermostabilized” RT that make it possible to prepare efficiently full-length cDNAs longer than 10 kb, combined with the selection of cDNA by biotinylated cap trapper to remove residual non-full-length cDNAs. Using these protocols, full-length cDNA libraries can be prepared at high yield without using polymerase chain reaction (PCR) that introduces sequence bias and causes overrepresentation of short clones in a library.
DOI: 10.1101/gr.115469.110
2011
Cited 171 times
Unamplified cap analysis of gene expression on a single-molecule sequencer
We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 μg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-μg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-μg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5' associated, we also observe a low level of signal on exons that is useful for defining gene structures.
DOI: 10.1371/journal.pgen.0020062
2006
Cited 171 times
Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs
The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.
DOI: 10.1101/gr.1119703
2003
Cited 165 times
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
DOI: 10.1038/nmeth1007
2007
Cited 160 times
Rapid SNP diagnostics using asymmetric isothermal amplification and a new mismatch-suppression technology
DOI: 10.1002/elps.1150140145
1993
Cited 152 times
Restriction landmark genomic scanning method and its various applications
We have developed a new genome scanning method (restriction landmark genomic scanning (RLGS), based on the new concept of using restriction enzyme sites as landmarks. RLGS employs direct end labeling of the genomic DNA digested with a restriction enzyme and two-dimensional electrophoresis with high-resolution. Its advantages are: (i) high-speed scanning ability, allowing simultaneous scanning of thousands of restriction landmarks; (ii) extension of the scanning field using different kinds of landmarks in an additional series of electrophoresis; (iii) application to any type of organism because of direct-labeling of restriction enzyme sites and no hybridization procedure; and (iv) reflection of the copy number of the restriction landmark by the spot intensity which enables distinction of haploid and diploid genomic DNAs. The RLGS method has various applications because it can be used to scan for physical genomic DNA states, such as amplification, deletion and methylation. The copy number of the locus of a restriction landmark can be estimated by the spot intensity to find either an amplified or deleted region. The methylation state of genomic DNA can also be discovered by use of a methylation-sensitive restriction enzyme sites as a restriction landmark (restriction landmark genomic scanning for screening methylated sites, RLGS-M). This article introduces the basic principle of RLGS and its applications to the analysis of cancer, mouse mutant DNAs and tissue-specific methylation, showing the usefulness of RLGS for a variety of biological fields.
DOI: 10.1371/journal.pgen.0020037
2006
Cited 151 times
Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs
Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air.
DOI: 10.1038/nbt.2840
2014
Cited 149 times
Interactive visualization and analysis of large-scale sequencing datasets using ZENBU
DOI: 10.1038/nmeth.1470
2010
Cited 148 times
Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan
Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and noncoding RNAs with capped 5' ends that vary in size. Methods to identify the 5' ends of transcripts will facilitate the discovery of new promoters and 5' ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we developed nano-cap analysis of gene expression (nanoCAGE), a method that captures the 5' ends of transcripts from as little as 10 ng of total RNA, and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5' ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
DOI: 10.1182/blood-2013-02-484188
2014
Cited 147 times
Transcription and enhancer profiling in human monocyte subsets
Human blood monocytes comprise at least 3 subpopulations that differ in phenotype and function. Here, we present the first in-depth regulome analysis of human classical (CD14(++)CD16(-)), intermediate (CD14(+)CD16(+)), and nonclassical (CD14(dim)CD16(+)) monocytes. Cap analysis of gene expression adapted to Helicos single-molecule sequencing was used to map transcription start sites throughout the genome in all 3 subsets. In addition, global maps of H3K4me1 and H3K27ac deposition were generated for classical and nonclassical monocytes defining enhanceosomes of the 2 major subsets. We identified differential regulatory elements (including promoters and putative enhancers) that were associated with subset-specific motif signatures corresponding to different transcription factor activities and exemplarily validated novel downstream enhancer elements at the CD14 locus. In addition to known subset-specific features, pathway analysis revealed marked differences in metabolic gene signatures. Whereas classical monocytes expressed higher levels of genes involved in carbohydrate metabolism, priming them for anaerobic energy production, nonclassical monocytes expressed higher levels of oxidative pathway components and showed a higher mitochondrial routine activity. Our findings describe promoter/enhancer landscapes and provide novel insights into the specific biology of human monocyte subsets.
DOI: 10.1093/bioinformatics/btq614
2010
Cited 147 times
SAMStat: monitoring biases in next generation sequencing data
The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time.We demonstrate that studying sequence features in mapped data can be used to identify biases particular to one sequencing protocol. Once identified, such biases can be considered in the downstream analysis or even be removed by read trimming or filtering techniques.SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net.timolassmann@gmail.com.
DOI: 10.1002/j.1460-2075.1989.tb08355.x
1989
Cited 142 times
Thyroid-stimulating hormone (TSH) deficiency caused by a single base substitution in the CAGYC region of the beta-subunit.
Congenital isolated thyroid-stimulating hormone (TSH) deficiency is an autosomal recessive disease that manifests as hypothyroidism (cretinism), causing severe mental and growth retardations. Patients were found to have a single base substitution in the codon for the 29th amino acid of the TSH beta subunit gene. The alteration is in the center of the so-called CAGYC region, which consists of an amino acid sequence conserved among all of the known glycoprotein hormone beta subunits. No other nucleotide substitutions have been found in the gene thus far sequenced. Microinjection of the mutated beta mRNAs into Xenopus laevis oocytes led to the formation of conformationally altered beta polypeptides that could not associate with alpha subunits. The mutation created a new recognition site for the enzyme MaeI. Southern blot hybridization of genomic DNA digested with MaeI showed that the patients were homozygous and their parents were heterozygous for the mutation. This test was also used to examine other family members for the disease.
DOI: 10.1016/j.mce.2009.12.012
2010
Cited 140 times
Molecular mechanisms of pituitary organogenesis: In search of novel regulatory genes
Defects in pituitary gland organogenesis are sometimes associated with congenital anomalies that affect head development. Lesions in transcription factors and signaling pathways explain some of these developmental syndromes. Basic research studies, including the characterization of genetically engineered mice, provide a mechanistic framework for understanding how mutations create the clinical characteristics observed in patients. Defects in BMP, WNT, Notch, and FGF signaling pathways affect induction and growth of the pituitary primordium and other organ systems partly by altering the balance between signaling pathways. The PITX and LHX transcription factor families influence pituitary and head development and are clinically relevant. A few later-acting transcription factors have pituitary-specific effects, including PROP1, POU1F1 (PIT1), and TPIT (TBX19), while others, such as NeuroD1 and NR5A1 (SF1), are syndromic, influencing development of other endocrine organs. We conducted a survey of genes transcribed in developing mouse pituitary to find candidates for cases of pituitary hormone deficiency of unknown etiology. We identified numerous transcription factors that are members of gene families with roles in syndromic or non-syndromic pituitary hormone deficiency. This collection is a rich source for future basic and clinical studies.
DOI: 10.1073/pnas.1317751111
2014
Cited 138 times
PAPD5-mediated 3′ adenylation and subsequent degradation of miR-21 is disrupted in proliferative disease
Next-generation sequencing experiments have shown that microRNAs (miRNAs) are expressed in many different isoforms (isomiRs), whose biological relevance is often unclear. We found that mature miR-21, the most widely researched miRNA because of its importance in human disease, is produced in two prevalent isomiR forms that differ by 1 nt at their 3' end, and moreover that the 3' end of miR-21 is posttranscriptionally adenylated by the noncanonical poly(A) polymerase PAPD5. PAPD5 knockdown caused an increase in the miR-21 expression level, suggesting that PAPD5-mediated adenylation of miR-21 leads to its degradation. Exoribonuclease knockdown experiments followed by small-RNA sequencing suggested that PARN degrades miR-21 in the 3'-to-5' direction. In accordance with this model, microarray expression profiling demonstrated that PAPD5 knockdown results in a down-regulation of miR-21 target mRNAs. We found that disruption of the miR-21 adenylation and degradation pathway is a general feature in tumors across a wide range of tissues, as evidenced by data from The Cancer Genome Atlas, as well as in the noncancerous proliferative disease psoriasis. We conclude that PAPD5 and PARN mediate degradation of oncogenic miRNA miR-21 through a tailing and trimming process, and that this pathway is disrupted in cancer and other proliferative diseases.
DOI: 10.1038/ncomms13295
2016
Cited 137 times
Genome sequence and analysis of the Japanese morning glory Ipomoea nil
Abstract Ipomoea is the largest genus in the family Convolvulaceae. Ipomoea nil (Japanese morning glory) has been utilized as a model plant to study the genetic basis of floricultural traits, with over 1,500 mutant lines. In the present study, we have utilized second- and third-generation-sequencing platforms, and have reported a draft genome of I. nil with a scaffold N50 of 2.88 Mb (contig N50 of 1.87 Mb), covering 98% of the 750 Mb genome. Scaffolds covering 91.42% of the assembly are anchored to 15 pseudo-chromosomes. The draft genome has enabled the identification and cataloguing of the Tpn1 family transposons, known as the major mutagen of I. nil , and analysing the dwarf gene, CONTRACTED , located on the genetic map published in 1956. Comparative genomics has suggested that a whole genome duplication in Convolvulaceae, distinct from the recent Solanaceae event, has occurred after the divergence of the two sister families.
DOI: 10.1016/j.devcel.2009.10.011
2009
Cited 132 times
A Systems Approach Reveals that the Myogenesis Genome Network Is Regulated by the Transcriptional Repressor RP58
We created a whole-mount in situ hybridization (WISH) database, termed EMBRYS, containing expression data of 1520 transcription factors and cofactors expressed in E9.5, E10.5, and E11.5 mouse embryos—a highly dynamic stage of skeletal myogenesis. This approach implicated 43 genes in regulation of embryonic myogenesis, including a transcriptional repressor, the zinc-finger protein RP58 (also known as Zfp238). Knockout and knockdown approaches confirmed an essential role for RP58 in skeletal myogenesis. Cell-based high-throughput transfection screening revealed that RP58 is a direct MyoD target. Microarray analysis identified two inhibitors of skeletal myogenesis, Id2 and Id3, as targets for RP58-mediated repression. Consistently, MyoD-dependent activation of the myogenic program is impaired in RP58 null fibroblasts and downregulation of Id2 and Id3 rescues MyoD's ability to promote myogenesis in these cells. Our combined, multi-system approach reveals a MyoD-activated regulatory loop relying on RP58-mediated repression of muscle regulatory factor (MRF) inhibitors.
DOI: 10.1101/gr.084541.108
2008
Cited 132 times
Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE
Finding and characterizing mRNAs, their transcription start sites (TSS), and their associated promoters is a major focus in post-genome biology. Mammalian cells have at least 5-10 magnitudes more TSS than previously believed, and deeper sequencing is necessary to detect all active promoters in a given tissue. Here, we present a new method for high-throughput sequencing of 5' cDNA tags-DeepCAGE: merging the Cap Analysis of Gene Expression method with ultra-high-throughput sequence technology. We apply DeepCAGE to characterize 1.4 million sequenced TSS from mouse hippocampus and reveal a wealth of novel core promoters that are preferentially used in hippocampus: This is the most comprehensive promoter data set for any tissue to date. Using these data, we present evidence indicating a key role for the Arnt2 transcription factor in hippocampus gene regulation. DeepCAGE can also detect promoters used only in a small subset of cells within the complex tissue.
DOI: 10.1186/gb-2009-10-7-r79
2009
Cited 131 times
Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data
With the advent of ultra high-throughput sequencing technologies, increasingly researchers are turning to deep sequencing for gene expression studies. Here we present a set of rigorous methods for normalization, quantification of noise, and co-expression analysis of deep sequencing data. Using these methods on 122 cap analysis of gene expression (CAGE) samples of transcription start sites, we construct genome-wide 'promoteromes' in human and mouse consisting of a three-tiered hierarchy of transcription start sites, transcription start clusters, and transcription start regions.
DOI: 10.1073/pnas.0803640105
2008
Cited 128 times
Transcriptome-based systematic identification of extracellular matrix proteins
Extracellular matrix (ECM), which provides critical scaffolds for all adhesive cells, regulates proliferation, differentiation, and apoptosis. Different cell types employ customized ECMs, which are thought to play important roles in the generation of so-called niches that contribute to cell-specific functions. The molecular entities of these customized ECMs, however, have not been elucidated. Here, we describe a strategy for transcriptome-wide identification of ECM proteins based on computational screening of >60,000 full-length mouse cDNAs for secreted proteins, followed by in vitro functional assays. These assays screened the candidate proteins for ECM-assembling activities, interactions with other ECM molecules, modifications with glycosaminoglycans, and cell-adhesive activities, and were then complemented with immunohistochemical analysis. We identified 16 ECM proteins, of which seven were localized in basement membrane (BM) zones. The identification of these previously unknown BM proteins allowed us to construct a body map of BM proteins, which represents the comprehensive immunohistochemistry-based expression profiles of the tissue-specific customization of BMs.
DOI: 10.1038/pcan.2010.32
2010
Cited 127 times
miR-148a is an androgen-responsive microRNA that promotes LNCaP prostate cell growth by repressing its target CAND1 expression
DOI: 10.1101/gr.095273.109
2010
Cited 127 times
Cross-mapping and the identification of editing sites in mature microRNAs in high-throughput sequencing libraries
MicroRNAs (miRNAs) are short (20-23 nt) RNAs that are sequence-specific mediators of transcriptional and post-transcriptional regulation of gene expression. Modern high-throughput technologies enable deep sequencing of such RNA species on an unprecedented scale. We find that the analysis of small RNA deep-sequencing libraries can be affected by cross-mapping, in which RNA sequences originating from one locus are inadvertently mapped to another. Similar to cross-hybridization on microarrays, cross-mapping is prevalent among miRNAs, as they tend to occur in families, are similar or derived from repeat or structural RNAs, or are post-transcriptionally modified. Here, we develop a strategy to correct for cross-mapping, and apply it to the analysis of RNA editing in mature miRNAs. In contrast to previous reports, our analysis suggests that RNA editing in mature miRNAs is rare in animals.
DOI: 10.1158/0008-5472.can-16-1645
2017
Cited 126 times
Bone Marrow Adipocytes Facilitate Fatty Acid Oxidation Activating AMPK and a Transcriptional Network Supporting Survival of Acute Monocytic Leukemia Cells
Leukemia cells in the bone marrow must meet the biochemical demands of increased cell proliferation and also survive by continually adapting to fluctuations in nutrient and oxygen availability. Thus, targeting metabolic abnormalities in leukemia cells located in the bone marrow is a novel therapeutic approach. In this study, we investigated the metabolic role of bone marrow adipocytes in supporting the growth of leukemic blasts. Prevention of nutrient starvation-induced apoptosis of leukemic cells by bone marrow adipocytes, as well as the metabolic and molecular mechanisms involved in this process, was investigated using various analytic techniques. In acute monocytic leukemia (AMoL) cells, the prevention of spontaneous apoptosis by bone marrow adipocytes was associated with an increase in fatty acid β-oxidation (FAO) along with the upregulation of PPARγ, FABP4, CD36, and BCL2 genes. In AMoL cells, bone marrow adipocyte coculture increased adiponectin receptor gene expression and its downstream target stress response kinase AMPK, p38 MAPK with autophagy activation, and upregulated antiapoptotic chaperone HSPs. Inhibition of FAO disrupted metabolic homeostasis, increased reactive oxygen species production, and induced the integrated stress response mediator ATF4 and apoptosis in AMoL cells cocultured with bone marrow adipocytes. Our results suggest that bone marrow adipocytes support AMoL cell survival by regulating their metabolic energy balance and that the disruption of FAO in bone marrow adipocytes may be an alternative, novel therapeutic strategy for AMoL therapy. Cancer Res; 77(6); 1453-64. ©2017 AACR.
DOI: 10.1073/pnas.1312717110
2014
Cited 110 times
Differential roles of epigenetic changes and Foxp3 expression in regulatory T cell-specific transcriptional regulation
Naturally occurring regulatory T (Treg) cells, which specifically express the transcription factor forkhead box P3 (Foxp3), are engaged in the maintenance of immunological self-tolerance and homeostasis. By transcriptional start site cluster analysis, we assessed here how genome-wide patterns of DNA methylation or Foxp3 binding sites were associated with Treg-specific gene expression. We found that Treg-specific DNA hypomethylated regions were closely associated with Treg up-regulated transcriptional start site clusters, whereas Foxp3 binding regions had no significant correlation with either up- or down-regulated clusters in nonactivated Treg cells. However, in activated Treg cells, Foxp3 binding regions showed a strong correlation with down-regulated clusters. In accordance with these findings, the above two features of activation-dependent gene regulation in Treg cells tend to occur at different locations in the genome. The results collectively indicate that Treg-specific DNA hypomethylation is instrumental in gene up-regulation in steady state Treg cells, whereas Foxp3 down-regulates the expression of its target genes in activated Treg cells. Thus, the two events seem to play distinct but complementary roles in Treg-specific gene expression.
DOI: 10.1101/gr.156232.113
2014
Cited 108 times
Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing
CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5′ end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.
DOI: 10.1371/journal.pgen.1006641
2017
Cited 105 times
Analysis of the human monocyte-derived macrophage transcriptome and response to lipopolysaccharide provides new insights into genetic aetiology of inflammatory bowel disease
The FANTOM5 consortium utilised cap analysis of gene expression (CAGE) to provide an unprecedented insight into transcriptional regulation in human cells and tissues. In the current study, we have used CAGE-based transcriptional profiling on an extended dense time course of the response of human monocyte-derived macrophages grown in macrophage colony-stimulating factor (CSF1) to bacterial lipopolysaccharide (LPS). We propose that this system provides a model for the differentiation and adaptation of monocytes entering the intestinal lamina propria. The response to LPS is shown to be a cascade of successive waves of transient gene expression extending over at least 48 hours, with hundreds of positive and negative regulatory loops. Promoter analysis using motif activity response analysis (MARA) identified some of the transcription factors likely to be responsible for the temporal profile of transcriptional activation. Each LPS-inducible locus was associated with multiple inducible enhancers, and in each case, transient eRNA transcription at multiple sites detected by CAGE preceded the appearance of promoter-associated transcripts. LPS-inducible long non-coding RNAs were commonly associated with clusters of inducible enhancers. We used these data to re-examine the hundreds of loci associated with susceptibility to inflammatory bowel disease (IBD) in genome-wide association studies. Loci associated with IBD were strongly and specifically (relative to rheumatoid arthritis and unrelated traits) enriched for promoters that were regulated in monocyte differentiation or activation. Amongst previously-identified IBD susceptibility loci, the vast majority contained at least one promoter that was regulated in CSF1-dependent monocyte-macrophage transitions and/or in response to LPS. On this basis, we concluded that IBD loci are strongly-enriched for monocyte-specific genes, and identified at least 134 additional candidate genes associated with IBD susceptibility from reanalysis of published GWA studies. We propose that dysregulation of monocyte adaptation to the environment of the gastrointestinal mucosa is the key process leading to inflammatory bowel disease.
DOI: 10.1093/nar/gkw995
2016
Cited 100 times
Update of the FANTOM web resource: high resolution transcriptome of diverse cell types in mammals
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
DOI: 10.1093/nar/gkv608
2015
Cited 98 times
Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium
Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE. Here, we report on the first large-scale complex tissue transcriptome comparison between full-length versus 5'-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two data sets showed high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, up to 75% showed consensus of 7-fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other data set, likely dependent on the cell type proportions included in each tissue sample. Our results show that RNA-Seq and CAGE tissue transcriptome data sets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.
DOI: 10.1038/ncomms9219
2015
Cited 94 times
TET2 repression by androgen hormone regulates global hydroxymethylation status and prostate cancer progression
Modulation of epigenetic patterns has promising efficacy for treating cancer. 5-Hydroxymethylated cytosine (5-hmC) is an epigenetic mark potentially important in cancer. Here we report that 5-hmC is an epigenetic hallmark of prostate cancer (PCa) progression. A member of the ten-eleven translocation (TET) proteins, which catalyse the oxidation of methylated cytosine (5-mC) to 5-hmC, TET2, is repressed by androgens in PCa. Androgen receptor (AR)-mediated induction of the miR-29 family, which targets TET2, are markedly enhanced in hormone refractory PCa (HRPC) and its high expression predicts poor outcome of PCa patients. Furthermore, decreased expression of miR-29b results in reduced tumour growth and increased TET2 expression in an animal model of HRPC. Interestingly, global 5-hmC modification regulated by miR-29b represses FOXA1 activity. A reduction in 5-hmC activates PCa-related key pathways such as mTOR and AR. Thus, DNA modification directly links the TET2-dependent epigenetic pathway regulated by AR to 5-hmC-mediated tumour progression.
DOI: 10.1016/j.neulet.2016.10.042
2017
Cited 90 times
Next-generation sequencing-based small RNA profiling of cerebrospinal fluid exosomes
MicroRNAs (miRNAs), particularly those found in human body fluids, have been suggested as potential biomarkers. Among various body fluids, the cerebrospinal fluid (CSF) shows promise as a profiling target for diagnosis and monitoring of neurological diseases. However, relevant genome-scale studies are limited and no studies have profiled exosomal miRNAs in CSF. Therefore, we conducted a next-generation sequencing-based genome-wide survey of small RNAs in the exosomal and non-exosomal (supernatant) fractions of healthy human CSF as well as serum in each donor. We observed miRNA enrichment in the exosomal fractions relative to the supernatant fractions of both CSF and serum. We also observed substantial differences in exosomal miRNA profiles between CSF and serum. Half of the reported brain miRNAs were found in CSF exosomal fractions. In particular, miR-1911-5p, specifically expressed in brain tissue, was detected in CSF but not in serum, as confirmed by digital PCR in three additional donors. Our data suggest that the brain is a major source of CSF exosomal miRNAs. Here we provide the important evidence that exosomal miRNAs in CSF may reflect brain pathophysiology.
DOI: 10.1016/j.tig.2015.11.004
2016
Cited 83 times
Enhanced Identification of Transcriptional Enhancers Provides Mechanistic Insights into Diseases
Enhancers are distal cis-regulatory DNA elements that increase the expression of target genes. Various experimental and computational approaches including chromatin signature profiling have been developed to predict enhancers on a genome-wide scale, although each method has its advantages and disadvantages. Here we overview an emerging method to identify transcribed enhancers at exceedingly high nucleotide resolution based on enhancer RNA transcripts captured by Cap Analysis of Gene Expression (CAGE) technology. We further argue that disease-causative regulatory mutations at enhancers are increasingly recognized, emphasizing the importance of enhancer identification in functional and clinical genomics including, but not limited to, genome-wide association studies (GWASs) and cancer genomics studies.