ϟ

Eric D. Ross

Here are all the papers by Eric D. Ross that you can download and read on OA.mg.
Eric D. Ross’s last known institution is . Download Eric D. Ross PDFs here.

Claim this Profile →
DOI: 10.1038/nature11922
2013
Cited 1,253 times
Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS
Algorithms designed to identify canonical yeast prions predict that around 250 human proteins, including several RNA-binding proteins associated with neurodegenerative disease, harbour a distinctive prion-like domain (PrLD) enriched in uncharged polar amino acids and glycine. PrLDs in RNA-binding proteins are essential for the assembly of ribonucleoprotein granules. However, the interplay between human PrLD function and disease is not understood. Here we define pathogenic mutations in PrLDs of heterogeneous nuclear ribonucleoproteins (hnRNPs) A2B1 and A1 in families with inherited degeneration affecting muscle, brain, motor neuron and bone, and in one case of familial amyotrophic lateral sclerosis. Wild-type hnRNPA2 (the most abundant isoform of hnRNPA2B1) and hnRNPA1 show an intrinsic tendency to assemble into self-seeding fibrils, which is exacerbated by the disease mutations. Indeed, the pathogenic mutations strengthen a 'steric zipper' motif in the PrLD, which accelerates the formation of self-seeding fibrils that cross-seed polymerization of wild-type hnRNP. Notably, the disease mutations promote excess incorporation of hnRNPA2 and hnRNPA1 into stress granules and drive the formation of cytoplasmic inclusions in animal models that recapitulate the human pathology. Thus, dysregulated polymerization caused by a potent mutant steric zipper motif in a PrLD can initiate degenerative disease. Related proteins with PrLDs should therefore be considered candidates for initiating and perhaps propagating proteinopathies of muscle, brain, motor neuron and bone.
DOI: 10.1096/fj.202001351
2020
Cited 103 times
A proposed role for the SARS‐CoV‐2 nucleocapsid protein in the formation and regulation of biomolecular condensates
The FASEB JournalVolume 34, Issue 8 p. 9832-9842 HYPOTHESESOpen Access A proposed role for the SARS-CoV-2 nucleocapsid protein in the formation and regulation of biomolecular condensates Sean M. Cascarina, Sean M. Cascarina Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO, USASearch for more papers by this authorEric D. Ross, Corresponding Author Eric D. Ross [email protected] Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO, USA Correspondence Eric D. Ross, Department of Biochemistry and Molecular Biology, Colorado State University, 1870 Campus Delivery, Fort Collins, CO 80523, USA. Email: [email protected]Search for more papers by this author Sean M. Cascarina, Sean M. Cascarina Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO, USASearch for more papers by this authorEric D. Ross, Corresponding Author Eric D. Ross [email protected] Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO, USA Correspondence Eric D. Ross, Department of Biochemistry and Molecular Biology, Colorado State University, 1870 Campus Delivery, Fort Collins, CO 80523, USA. Email: [email protected]Search for more papers by this author First published: 20 June 2020 https://doi.org/10.1096/fj.202001351Citations: 61 This article was fast-tracked under a recently instituted interim policy in which the editors may, at their discretion, accept coronavirus-related manuscripts submitted for the Review, Perspectives and Hypotheses categories without additional review. [Correction added on July 2, 2020, after first online publication: The SARS-CoV 9192 under introduction has been corrected to "The nucleocapsid ("N") protein of the closely related SARS-CoV".] AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Abstract To date, the recently discovered SARS-CoV-2 virus has afflicted >6.9 million people worldwide and disrupted the global economy. Development of effective vaccines or treatments for SARS-CoV-2 infection will be aided by a molecular-level understanding of SARS-CoV-2 proteins and their interactions with host cell proteins. The SARS-CoV-2 nucleocapsid (N) protein is highly homologous to the N protein of SARS-CoV, which is essential for viral RNA replication and packaging into new virions. Emerging models indicate that nucleocapsid proteins of other viruses can form biomolecular condensates to spatiotemporally regulate N protein localization and function. Our bioinformatic analyses, in combination with pre-existing experimental evidence, suggest that the SARS-CoV-2 N protein is capable of forming or regulating biomolecular condensates in vivo by interaction with RNA and key host cell proteins. We discuss multiple models, whereby the N protein of SARS-CoV-2 may harness this activity to regulate viral life cycle and host cell response to viral infection. Abbreviations ERGIC ER-Golgi intermediate compartment LCD low-complexity domain LLPS liquid-liquid phase separation N nucleocapsid RNP ribonucleoprotein SR-domain serine/arginine domain UPR unfolded protein response 1 INTRODUCTION SARS-CoV-2 has recently emerged as the seventh coronavirus known to infect humans.1 Since its discovery, SARS-CoV-2 has resulted in >6.9 million documented human infections worldwide and >400 000 deaths reported to date, according to the World Health Organization (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports; accessed on 6/9/20). Given the magnitude of this ongoing pandemic, a molecular-level understanding of SARS-CoV-2 infection and host interaction is of paramount importance for rational drug development. The nucleocapsid ("N") protein of the closely related SARS-CoV2, 3 is essential for the formation of new virions and is the most common antigen of host-produced antibodies during infection by the closely related SARS-CoV virus,4 and the SARS-CoV-2 N protein is considered a promising molecular target for effective drug treatments and vaccines.5, 6 We hypothesize that the N protein of SARS-CoV-2 is capable of forming and altering biomolecular condensates, which may fulfill multiple roles in regulating viral replication and host cell response during infection. The N protein of SARS-CoV-2 contains all of the hallmark features of proteins known to form biomolecular condensates in vivo, including multiple domains capable of binding RNA, multiple low-complexity regions, an oligomerization domain that mediates N-N homotypic interactions, a high predicted phase separation propensity, and direct physical interactions with multiple stress granule components. These features may be involved in the regulation of host cell biomolecular condensates (namely, stress granules) as well as the formation of ribonucleoprotein (RNP) condensates during viral RNA genome packaging into nascent virions. In the ensuing sections, we discuss the evidence for and implications of this hypothesis. 2 THE SARS-CoV-2 N PROTEIN CONTAINS A MODULAR DOMAIN ARCHITECTURE CHARACTERISTIC OF PROTEINS RECRUITED TO BIOMOLECULAR CONDENSATES The N protein of SARS-CoV-2 is a putative RNA-binding protein responsible for assembling the viral RNA genome into compact RNP complexes for encapsulation in the viral membrane. Sequence alignment of human coronavirus N proteins indicates that the SARS-CoV-2 N protein closely resembles the SARS-CoV N protein (and, to a lesser extent, the MERS-CoV N protein) but not other human coronavirus N proteins (Figure 1A). The high level of sequence alignment between the N proteins of SARS-CoV-2 and SARS-CoV suggests that these proteins exhibit substantial overlap in overall domain organization, structure, and function. Therefore, many of the features governing N protein activity in SARS-CoV may extrapolate to SARS-CoV-2 N protein. FIGURE 1Open in figure viewer The N protein of SARS-CoV-2 resembles the SARS-CoV N protein and contains multiple LCDs. A, Multiple sequence alignment of N proteins from the seven coronavirus strains known to infect humans. B, Composition scan depicting the local S, R, and K content of the N protein sequence from SARS-CoV-2. The SARS-CoV-2 N protein sequence was scanned with a 20aa window and the percent composition of each amino acid was calculated at each position, similar to previous studies.90, 91 A region was considered an LCD if any single amino acid constituted ≥40% of the window sequence The SARS-CoV N protein consists of two structured domains, referred to as NTD and CTD, separated by a disordered linker and flanked on both termini by disordered tails.4 The NTD is primarily responsible for RNA-binding, although the middle linker and CTD are also capable of binding RNA, and all three disordered regions enhance the affinity for RNA of both the NTD and the CTD.4 The N protein also contains two distinct LCDs; an S-rich domain (containing a secondary bias for R) within the middle "linker" region and a K-rich domain within the C-terminal disordered tail (Figure 1B). This combination of protein features––multiple RNA-binding domains, an oligomerization domain, and multiple LCDs––matches the prototypical architecture of proteins recruited to stress granules and other membraneless organelles. These organelles are often described as "biomolecular condensates" that form through liquid-liquid phase separation (LLPS). Based on current models, this process is mediated by weak, multivalent interactions through a combination of (1) homotypic ("self") interactions involving LCDs and oligomerization domains, and (2) heterotypic protein-protein and protein-RNA interactions involving RNA-binding domains, LCDs, and/or specific protein-protein interaction domains.7 By two leading phase separation prediction methods,8, 9 the N protein achieves the highest overall score for the SARS-CoV-2 proteome (Figure 2A). In both cases, the peak phase separation score corresponds to the S/R-rich LCD ("SR-domain"), with a secondary (albeit lower) score corresponding to the RNA-binding NTD (Figure 2B,C). Additionally, the SR-domain passes the previously defined confidence threshold for the PSP method (Figure 2B), suggesting that the phase separation score is sufficiently high to be considered a likely LLPS protein. catGRANULE does not provide a prediction threshold, though the cumulative density function value for the N protein suggests that it scores well above average relative to eukaryotic proteins (Table 1). The phase separation scores for the N protein are consistent with the lower range of phase separation scores for human proteins with RNA-binding and prion-like domains that have characterized phase separation ability (Table 1;10). Collectively, the reasonably high-scoring SR-domain, in combination with the RNA-binding domains and oligomerization domain, support a role for the SARS-CoV-2 protein in recruitment to and/or regulation of biomolecular condensates. FIGURE 2Open in figure viewer The SARS-CoV-2 N protein has an above-average predicted phase separation propensity, with the SR-domain being the highest-scoring region. A, Phase separation scores for all SARS-CoV-2 proteins with both PSP9 and catGRANULE.8 A consensus sequence was built for each SARS-CoV-2 N protein by calculating the most frequent amino acid at each position from multiple sequence alignment of ~1900 sequences available on the NCBI Virus database (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/; downloaded on 5/6/2020). ORF1ab sequences were parsed into separate sequences for the 16 nsp proteins in SARS-CoV-2 according to the cleavage sites in Chan et al92 and separately aligned. B, Phase separation score profile for the N protein from SARS-CoV-2 using PSP. C, Phase separation score profile for the N protein from SARS-CoV-2 using catGRANULE TABLE 1. Phase separation prediction of the SARS-CoV-2 N protein and known human LLPS proteins Protein Pscore catGRANULE FUS (NM_004960) 13.96 5.75 hnRNPA1 (NM_031157) 13.89 4.89 EWSR1 (NM_013986) 9.41 3.34 hnRNPA2B1 (NM_031243) 14.06 4.66 hnRNPDL (NM_031372) 16.83 2.09 TAF15 (NM_139215) 18.48 6.29 TDP43 (NM_007375) 5.95 2.04 TIA1 (NM_022173) 5.57 0.97 N Protein (SARS-CoV-2) 2.49a 1.65b Note The Pscore and catGRANULE score for RNA-binding proteins commonly linked to stress granules and LLPS were calculated using PSP9 and catGRANULE,8 respectively. a In the 86th percentile for all proteins in the human proteome (Uniprot proteome UP000005640). b catGRANULE currently does not support whole-proteome analyses. However, another human protein, TRA2A, has a score of 2.14 and ranks 188th out of 20 190 human proteins, placing it in the 99th percentile.11 3 MULTIPLE DISTINCT LINES OF EVIDENCE CONNECT THE N PROTEIN WITH STRESS GRANULES Many viruses can influence host cell stress granule formation by either inhibiting or enhancing stress granule formation.12 Additionally, some viruses can induce the formation of stress granule-like compartments containing canonical stress granule markers but with otherwise distinct molecular compositions.13-16 To our knowledge, the effect of SARS-CoV or SARS-CoV-2 on host cell stress granule formation has not yet been examined experimentally. However, multiple observations converge to suggest biologically meaningful connections between SARS-CoV/SARS-CoV-2 and stress granules. First, SARS-CoV infection activates both PKR and PERK,17 which are cytoplasmic and ER kinases, respectively, that phosphorylate eIF2α and induce the formation of stress granules.18 PKR senses double-stranded RNA (dsRNA) by direct binding,19 and SARS-CoV infection produces an abundance of dsRNA in infected cells.17 PERK is activated in response to the ER-associated unfolded protein response (UPR) pathway,19 and the UPR is also induced by SARS-CoV infection.20 While both PERK and PKR are activated during SARS-CoV infection, PERK appears to be predominantly responsible for subsequent eIF2α phosphorylation, and the typical antiviral activity associated with PKR activation is suppressed during infection.17 Importantly, activation of PKR and PERK, as well as phosphorylation of eIF2α do not impair viral replication. This indicates that SARS-CoV possesses mechanisms to induce host cell stress but evade counteractive host cell responses by inhibiting downstream events in the PERK and PKR pathways. Second, host cell translation is inhibited by multiple mechanisms in SARS-CoV–infected cells,21-24 and translation inhibition is commonly associated with stress granule formation.18 The SARS-CoV nonstructural protein 1 (nsp1) directly inhibits translation by binding to the host cell ribosomal components21, 24; Lokugamage et al speculated that this could lead to the shuttling of abortive transcripts to stress granules or processing bodies. nsp1 also induces degradation of host cell mRNA,22, 23 including the mRNA of the antiviral factor IFN-β,23 while sparing its own viral genomic RNA.25 Additionally, stress granules were shown experimentally to form in response to translation inhibition and eIF2α phosphorylation (albeit at different stages of infection) in transmissible gastroenteritis virus (TGEV) and murine hepatitis virus (MHV), two other members of the Coronaviridae family.26, 27 The parallels observed between translation inhibition and eIF2α phosphorylation suggest that stress granule formation may also occur in SARS-CoV and SARS-CoV-2. It is worth noting that MERS, another member of the Coronaviridae family, has been shown to inhibit stress granule formation,28, 29 although this occurs via inhibition of PKR,29 which is contrary to the PKR activation observed during SARS-CoV infection. Third, the N protein of SARS-CoV is recruited to stress granules via its SR-domain and can be phosphorylated at multiple sites within the SR-domain in vitro by SRPK1,30 the mammalian homolog of a yeast SR-kinase that regulates stress granules.31 The SR-domain is also phosphorylated by the host cell kinase GSK-332 which is dependent on phospho-serine within its recognition motif,33 suggesting that multi-site phosphorylation of the SR-domain by SRPK1 may also initiate subsequent phosphorylation by GSK-3. The N protein colocalizes with the stress granule markers PABP1 and TIA-1 under sodium arsenite stress, but this colocalization is suppressed when SRPK1 is overexpressed,30 which is reminiscent of the role of yeast Sky1 in stress granule disassembly.31 The SR-domain of the SARS-CoV N protein has a high affinity for the prion-like domain of human hnRNPA1,34 which is another known component of stress granules and plays important roles in RNA metabolism and disease.35 Finally, the N protein undergoes oligomerization and can suppress translation in vitro, yet both activities are inhibited upon SR-domain phosphorylation. While further experiments are required to definitively resolve the relationship between SR-domain phosphorylation and N protein oligomerization,30, 36 the evidence suggests that modulation of the electrostatic properties of the SR-domain by phosphorylation can act as a switch to spatiotemporally regulate N protein activity and function during different stages of SARS-CoV infection cycle. Fourth, two independent studies report comprehensive sets of protein-protein interactions between SARS-CoV-2 proteins and human proteins. In both cases, the core stress granule components G3BP1 and G3BP2, as well as other components, co-precipitate with the N protein [37 and Li et al (doi: https://doi.org/10.1101/2020.03.31.019216); at the time of this writing, the study by Li et al has not formally completed peer review]. G3BP1 and G3BP2 are essential stress granule components, and G3BP1 was recently shown by three independent groups to undergo LLPS and drive the formation of biomolecular condensates (namely, stress granules) in mammalian cells.38-40 Interaction between the SARS-CoV-2 N protein and G3BP1/G3BP2 could either enhance stress granule induction or inhibit stress granule formation by sequestering G3BP1/G3BP2, as observed for multiple flavivirus nucleocapsid proteins.41 In summary, (1) the activation of stress granule-inducing kinases PKR and PERK, (2) the concurrent phosphorylation of eIF2α, (3) the observed host translation shutoff, (4) the suppression of host cell responses often related to stress granule formation, (5) the demonstrated ability of N protein to join stress granules in a regulated manner, (6) the above-average predicted phase separation propensity, (7) the multivalent domain architecture of the N protein (including domains involved in both RNA-binding and oligomerization), and (8) physical interaction with multiple stress granule components collectively suggest that the N protein may aid in modulating stress granule formation or function. Based on this evidence, we hypothesize that the multiple known interactions between N protein and stress granule components (including N protein homotypic interactions) will have biologically relevant effects on stress granule formation, regulation, or function during SARS-CoV-2 infection (Figure 3A). We propose three possible models for SARS-CoV-2 N protein participation in stress granules (Figure 3B). In the first model, N protein may be recruited to canonical host cell stress granules and either become a "passive observer" (exerting little to no effect on stress granule nucleation, morphology, or function) or play an active role in either nucleating stress granules or regulating stress granule function. In this model, N protein could be contributing to the translation suppression observed in SARS-CoV–infected cells,30 while apparently maintaining translation of its own proteins and sustaining viral replication.17 In the second model, SARS-CoV-2 proteins may induce the formation of unique types of granules, either independently of stress granules or via initial recruitment to but subsequent separation from stress granules. Support for this type of model exists for unrelated viruses, whereby viral proteins actually induce stress granules that differ from canonical stress granules in their molecular composition or co-opt stress granule components for viral replication or viral mRNA translation.13-16, 41, 42 In the third model, N protein may inhibit the formation of stress granules by physical interaction and sequestration of key stress granule components, including G3BP1, G3BP2, and hnRNPA1. While phosphorylation of eIF2α and translation suppression typically coincide with stress granule formation, stress granules do not form during Zika virus infection despite observed eIF2α phosphorylation, PKR activation, translation suppression, and induction of the UPR.41 Additionally, the N protein of influenza A virus is also apparently capable of inhibiting stress granules in spite of elevated eIF2α phosphorylation,43 although the N protein of influenza A exhibits virtually no sequence similarity with SARS-CoV-2 N protein (~11% sequence identity in a pairwise alignment with UniprotID P03466 using the EMBOSS Needle server; https://www.ebi.ac.uk/Tools/psa/emboss_needle/). Therefore, although this mode of action on stress granules is unusual, it is not without precedent. FIGURE 3Open in figure viewer Proposed models for the influence of the SARS-CoV-2 N protein on the formation and regulation of biomolecular condensates. A, Putative or directly observed interactions between the SARS-CoV-2 N protein and stress granule (SG) components. B, Three possible models for how the N protein of SARS-CoV-2 could affect stress granules in host cells: (1) N protein could be recruited to canonical host cell stress granules, which could have subtle or no effect on stress granules ("passive observer"), or could alter stress granule function by contributing to translation suppression, altering stress granule interactions, or remodeling stress granules; (2) N protein could recruit specific stress granule components to form unique, SARS-specific stress granules that could serve as sites of viral translation or replication; or (3) N protein could inhibit the formation of canonical host cell stress granules by sequestering critical stress granule components. SARS-CoV-2 may inhibit the typical stress granule-associated antiviral responses in host cells by suppressing downstream events that activate IFN-β expression. Additionally, the features of the SARS-CoV-2 N protein that facilitate interaction with stress granules may also facilitate biomolecular condensation of N protein and genomic RNA during nascent virion formation at the ER-Golgi intermediate compartment (ERGIC) 4 POTENTIAL MODULATION OF STRESS GRANULE-ASSOCIATED INNATE IMMUNE RESPONSES BY SARS-CoV-2 Both PKR and RIG-I have been identified in virus-induced stress granules,44-46 and stress granules can exert antiviral effects by influencing or directly mediating host cell innate immune responses.44, 47-54 While it is possible that SARS-CoV-2 blocks stress granule formation despite hallmark stress granule indicators (model three above), SARS-CoV suppresses host cell antiviral responses by multiple mechanisms, some of which occur downstream of stress granule formation. In addition to inducing IFN-β mRNA degradation,23 nsp1 also suppresses IFN-β expression via cytosolic sequestration,55 inhibition,55, 56 ubiquitin-dependent degradation,57 and cleavage58 of its transcription factor, IRF3. The N protein of both SARS-CoV and SARS-CoV-2 can act as suppressors of the host cell RNAi response to infection.59, 60 Independently of PKR, the host cell proteins RIG-I, MDA5, and OAS-RNase L are also capable of sensing dsRNA resulting from viral infection,61 but these antiviral innate immunity pathways are all inhibited at various stages by SARS-CoV proteins62, 63 or other coronavirus proteins.64-66 Finally, two recent independent studies indicate that SARS-CoV-2 upregulates the expression of pro-inflammatory cytokines and interferon-stimulated genes (ISGs), but fails to induce IFN-I and IFN-III interferons, including IFN-β.67, 68 Therefore, like SARS-CoV, SARS-CoV-2 is capable of suppressing the type-I IFN innate immune pathway. Collectively, blockage of the innate immune pathways downstream of stress granules by SARS-CoV-2 proteins may enable the formation of stress granules, while suppressing specific stress granule-associated host cell responses. It is important to note that a recent study reported inhibition of SARS-CoV-2 replication in cell culture models treated with multiple translation inhibitors, including the eIF4A inhibitor zotatifin.37 Inhibition of eIF4A is typically associated with increased stress granule formation69, 70 and can block replication of certain viruses.71 However, the mechanism by which translation inhibitors suppress SARS-CoV-2 replication was not examined, so it is unclear whether stress granule induction or global suppression of translation (presumably including the production of SARS-CoV-2 proteins) is predominantly responsible for the observed antiviral effect. 5 A POSSIBLE ROLE FOR BIOMOLECULAR CONDENSATION ACTIVITY OF N PROTEIN IN VIRAL RNP NUCLEOCAPSID ASSEMBLY AND VIRAL REPLICATION CENTERS Although the SARS-CoV N protein is capable of fulfilling a variety of roles, one of the best-characterized functions of the N protein for many viruses is the packaging of the viral RNA proteome into nascent virions.4 Interestingly, the capsid protein of human cytomegalovirus, pAP, which performs the same function as N protein in SARS viruses, has a high predicted phase separation score (3.8), and was shown to phase separate at high temperature in vitro.9 The N protein of the measles virus undergoes LLPS with its partner P protein in vitro, which mediates RNP condensation and the assembly of capsid-like particles.72 Liquid-like molecular assemblies containing the N protein were also observed specifically at sites along the ER during influenza A infection.73 Although the function of these compartments was not determined, SARS-CoV forms new viral nucleocapsids at the ER-Golgi intermediate compartment (ERGIC) membrane, so it is possible that the formation of similar liquid-like assemblies by the N protein mediates nucleocapsid formation during SARS-CoV and SARS-CoV-2 infections (Figure 3B). Quite recently, purified nucleocapsid proteins from numerous retroviruses were shown to phase separate in vitro, and the HIV-1 nucleocapsid protein forms Zn+2-dependent, reversible foci in mammalian cells consistent with biomolecular condensation.74 Finally, a variety of unrelated viruses are capable of forming liquid-like compartments that specifically contain (or are even dependent upon) the N protein homolog from each of the viruses in host cells as sites of viral replication.75-77 In addition to its function in nucleocapsid assembly, the SARS-CoV N protein is also thought to play a role in SARS-CoV replication,4 so it is possible that the N protein joins viral replication centers during SARS infections. It is worth noting that some evidence suggests that not all viral replication centers with liquid-like properties are entirely consistent with the LLPS model,78 so the precise nature of these membraneless organelles requires additional study. Cryo-EM79 and crystallographic80 models have suggested a filamentous packing of SARS-CoV genomic RNA on the inner membrane of new virions, mediated by ordered polymerization of the N protein. While a highly ordered, filamentous arrangement of N protein may appear to be at odds with an LLPS model of RNP formation, LLPS has actually been shown to enhance the nucleation of actin81 and microtubule82 filaments by concentrating monomeric components and assembly factors in biomolecular condensates. Additionally, filamentous actin bundles can also form dense assemblies with liquid-like properties,83 suggesting a role for LLPS in the macroscopic behavior and local concentration of ordered filaments. Therefore, the proposed LLPS activity of the N protein may actually be critical for the initiation and arrangement of RNPs during virion formation regardless of the degree of order in the final N protein-RNA assemblies. More broadly, these observations suggest that the formation of liquid-like biomolecular condensates may be a common activity for viral nucleocapsid proteins (Figure 3B) in spite of considerable sequence differences between nucleocapsid proteins. While this is currently a speculative model, the high predicted phase separation propensity and demonstrated phase separation ability of N proteins from other viruses suggest that for SARS-CoV-2, as well as other viruses, RNP formation and packaging could occur via biomolecular condensation involving the multivalent N protein. 6 INDIRECT TARGETING OF N PROTEIN REGULATION VIA INHIBITION OF HOST CELL KINASES AS AN ANTIVIRAL TREATMENT STRATEGY? The participation of the N protein in multiple processes vital for SARS-CoV replication raises an intriguing question: how can the N protein fulfill so many distinct roles throughout the viral infection cycle, and how are these roles balanced and regulated? The conservation of an SR-domain within the N protein across viruses of the Coronaviridae family suggests that the SR-domain is of functional importance to the virus. Site-specific phosphorylation of the SR-domain by defined host cell kinases, along with observed functional consequences of these phosphorylation events, seem to suggest that the N protein hijacks host cell kinases for its spatio-temporal regulation during the viral lifecycle. While rational drug or vaccine design targeting viral proteins is an important strategy, targeting host cell factors that appear to be advantageous to the virus may be equally effective. Interestingly, SRPK1 appears to play an important role in viral replication for numerous distinct viruses, as either inhibition or activation of SRPK1 can be advantageous to viruses.84-88 In the case of Hepatitis B virus, this effect is exerted through modulation of RNA packaging in the nucleocapsid, despite a radically different SR-domain compared to the SR-domain of the SARS-CoV-2 N protein.85 The substrate-kinase balance and full phosphorylation/dephosphorylation cycles involving SRPK1 have been proposed to be important for at least a subset of these viruses,85, 86 suggesting that both inhibition and overactivation of SRPK1 may independently be effective antiviral strategies. Furthermore, inhibition of SRPK1/2 has been shown to dampen viral replication in cell culture models.86-88 Similarly, inhibition of GSK-3 has been shown to reduce viral replication of SARS-CoV.32 SRPK1/2 inhibitors continue to be developed as potential cancer therapeutics, and one FDA-approved kinase inhibitor, Alectinib, potently cross-reacts with and inhibits SRPK1.89 Re-purposing of SRPK1/2 or GSK-3 inhibitors, or development of new inhibitors, could be viable strategies for disrupting SARS-CoV-2 viral replication and spread, although it must be emphasized that this is currently speculative and requires extensive testing to determine the safety and efficacy in humans. 7 CONCLUSION Collectively, sequence analyses in combination with emerging and pre-existing experimental evidence suggest that the formation and regulation of biomolecular condensates could be vital activ
DOI: 10.1016/j.jbc.2022.101677
2022
Cited 52 times
Phase separation by the SARS-CoV-2 nucleocapsid protein: Consensus and open questions
In response to the recent SARS-CoV-2 pandemic, a number of labs across the world have reallocated their time and resources to better our understanding of the virus. For some viruses, including SARS-CoV-2, viral proteins can undergo phase separation: a biophysical process often related to the partitioning of protein and RNA into membraneless organelles in vivo. In this review, we discuss emerging observations of phase separation by the SARS-CoV-2 nucleocapsid (N) protein—an essential viral protein required for viral replication—and the possible in vivo functions that have been proposed for N-protein phase separation, including viral replication, viral genomic RNA packaging, and modulation of host-cell response to infection. Additionally, since a relatively large number of studies examining SARS-CoV-2 N-protein phase separation have been published in a short span of time, we take advantage of this situation to compare results from similar experiments across studies. Our evaluation highlights potential strengths and pitfalls of drawing conclusions from a single set of experiments, as well as the value of publishing overlapping scientific observations performed simultaneously by multiple labs. In response to the recent SARS-CoV-2 pandemic, a number of labs across the world have reallocated their time and resources to better our understanding of the virus. For some viruses, including SARS-CoV-2, viral proteins can undergo phase separation: a biophysical process often related to the partitioning of protein and RNA into membraneless organelles in vivo. In this review, we discuss emerging observations of phase separation by the SARS-CoV-2 nucleocapsid (N) protein—an essential viral protein required for viral replication—and the possible in vivo functions that have been proposed for N-protein phase separation, including viral replication, viral genomic RNA packaging, and modulation of host-cell response to infection. Additionally, since a relatively large number of studies examining SARS-CoV-2 N-protein phase separation have been published in a short span of time, we take advantage of this situation to compare results from similar experiments across studies. Our evaluation highlights potential strengths and pitfalls of drawing conclusions from a single set of experiments, as well as the value of publishing overlapping scientific observations performed simultaneously by multiple labs. SARS-CoV-2, the virus responsible for COVID-19 and the ongoing pandemic, has exacted an enormous toll on human health, with >5.6 million deaths and >350 million infections currently attributed to the virus (according to the World Health Organization data, https://covid19.who.int/, accessed on 1/26/22). The pandemic had already led to an estimated $16 trillion in global economic costs by October 2020 (1Cutler D.M. Summers L.H. The COVID-19 pandemic and the $16 trillion virus.JAMA. 2020; 324: 1495-1496Google Scholar) and disrupted nearly every economic sector, including science. Over the past year and a half, extraordinary progress has been made in improving our understanding of this novel virus. Emerging experimental results implicate the SARS-CoV-2 nucleocapsid (N) protein as a critical viral factor mediating viral replication, viral genomic RNA (gRNA) packaging, and modulation of host-cell response to infection. Intriguingly, the N protein has the ability to undergo phase separation (PS), which is now considered a pervasive phenomenon organizing a broad diversity of biological processes in cells. In this review, we discuss emerging models, experimental results, and possible in vivo functions related to PS by the SARS-CoV-2 N protein. Additionally, given the remarkable number of related publications on this topic within the span of ∼ 1 year, we leverage this unusual situation to evaluate how overlapping work, performed and published in parallel by independent groups, may shape resulting conclusions. In practice, science is often performed sequentially: one published discovery typically precedes, informs, and directs subsequent experimentation. The incentive structure in science, which rewards novelty, promotes this sequential model, and disincentivizes studies focused on replication and validation. One limitation of this sequential model is that subtle differences in experimental design can sometimes have a significant impact on experimental results and thus influence the direction of subsequent experiments. However, the sequential publication model was punctuated by the SARS-CoV-2 pandemic. Many labs with historically little or no prior experience in virology applied their respective areas of expertise to questions related to SARS-CoV-2. This abrupt reallocation of resources to the same topic of study by many labs organically created a scientific question of its own: what happens when multiple labs perform and publish closely related experiments in parallel? Here, we compare a set of recent and related studies reporting PS by the SARS-CoV-2 N protein to examine this question. The prototypical function for coronaviral N proteins is to condense and organize gRNA in nascent virions (2Chang C. Hou M.-H. Chang C.-F. Hsiao C.-D. Huang T.-H. The SARS coronavirus nucleocapsid protein - forms and functions.Antiviral Res. 2014; 103: 39-50Google Scholar). Virion formation occurs via the accumulation of the SARS-CoV-2 structural proteins [the spike (S), envelope (E), membrane (M), and N proteins] and gRNA at the ER-Golgi intermediate compartment (ERGIC) membrane. Multiple studies suggest that a single strand of SARS-CoV-2 gRNA forms dense, locally ordered ribonucleoprotein (RNP) regions consisting predominantly of N protein associated with the gRNA strand (3Klein S. Cortese M. Winter S.L. Wachsmuth-Melm M. Neufeldt C.J. Cerikan B. Stanifer M.L. Boulant S. Bartenschlager R. Chlanda P. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography.Nat. Commun. 2020; 11: 5885Google Scholar, 4Yao H. Song Y. Chen Y. Wu N. Xu J. Sun C. Zhang J. Weng T. Zhang Z. Wu Z. Cheng L. Shi D. Lu X. Lei J. Crispin M. et al.Molecular architecture of the SARS-CoV-2 virus.Cell. 2020; 183: 730-738.e13Google Scholar, 5Cao C. Cai Z. Xiao X. Rao J. Chen J. Hu N. Yang M. Xing X. Wang Y. Li M. Zhou B. Wang X. Wang J. Xue Y. The architecture of the SARS-CoV-2 RNA genome inside virion.Nat. Commun. 2021; 12: 3917Google Scholar). Locally ordered RNPs may be further organized into more complex arrangements via clustering of RNPs in particular stoichiometries and geometries (3Klein S. Cortese M. Winter S.L. Wachsmuth-Melm M. Neufeldt C.J. Cerikan B. Stanifer M.L. Boulant S. Bartenschlager R. Chlanda P. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography.Nat. Commun. 2020; 11: 5885Google Scholar, 4Yao H. Song Y. Chen Y. Wu N. Xu J. Sun C. Zhang J. Weng T. Zhang Z. Wu Z. Cheng L. Shi D. Lu X. Lei J. Crispin M. et al.Molecular architecture of the SARS-CoV-2 virus.Cell. 2020; 183: 730-738.e13Google Scholar), although other evidence and prior models of the SARS-CoV N protein suggest a more linear, helical RNP arrangement (2Chang C. Hou M.-H. Chang C.-F. Hsiao C.-D. Huang T.-H. The SARS coronavirus nucleocapsid protein - forms and functions.Antiviral Res. 2014; 103: 39-50Google Scholar, 5Cao C. Cai Z. Xiao X. Rao J. Chen J. Hu N. Yang M. Xing X. Wang Y. Li M. Zhou B. Wang X. Wang J. Xue Y. The architecture of the SARS-CoV-2 RNA genome inside virion.Nat. Commun. 2021; 12: 3917Google Scholar, 6Filho H.V.R. Jara G.E. Batista F.A.H. Schleder G.R. Tonoli C.C. Soprano A.S. Guimarães S.L. Borges A.C. Cassago A. Bajgelman M.C. Marques R.E. Trivella D.B.B. Franchini K.G. Figueira A.C.M. Benedetti C.E. et al.Structural dynamics of SARS-CoV-2 nucleocapsid protein induced by RNA binding.bioRxiv. 2021; ([preprint])https://doi.org/10.1101/2021.08.27.457964Google Scholar). These RNPs preferentially accumulate on curved membranes (3Klein S. Cortese M. Winter S.L. Wachsmuth-Melm M. Neufeldt C.J. Cerikan B. Stanifer M.L. Boulant S. Bartenschlager R. Chlanda P. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography.Nat. Commun. 2020; 11: 5885Google Scholar), indicating either that RNP association aids in membrane curvature during nascent virion formation, or that curved membranes are the preferred recruitment surface for RNPs. The N protein also interacts with a luminal domain (i.e., in the interior of virions) of the M protein, which was proposed as a possible mechanism for mediating recruitment of N-containing RNPs to the ERGIC membrane (7Lu S. Ye Q. Singh D. Cao Y. Diedrich J.K. Yates J.R. Villa E. Cleveland D.W. Corbett K.D. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein.Nat. Commun. 2021; 12: 502Google Scholar). Some evidence suggests that the N protein of both SARS-CoV and SARS-CoV-2 may also interact with the E protein (8Tseng Y.-T. Wang S.-M. Huang K.-J. Wang C.-T. SARS-CoV envelope protein palmitoylation or nucleocapid association is not required for promoting virus-like particle production.J. Biomed. Sci. 2014; 21: 34Google Scholar, 9Li J. Guo M. Tian X. Wang X. Yang X. Wu P. Liu C. Xiao Z. Qu Y. Yin Y. Wang C. Zhang Y. Zhu Z. Liu Z. Peng C. et al.Virus-host interactome and proteomic survey reveal potential virulence factors influencing SARS-CoV-2 pathogenesis.Med. (N. Y.). 2021; 2: 99-112.e7Google Scholar). While precise detail regarding the interactions and arrangements of individual molecules within intact virions is still forthcoming, the N protein clearly plays a central role in gRNA compaction and organization in SARS-CoV-2 virions. PS by a protein involves the formation of two distinct yet coexisting phases from a well-mixed protein solution: a dense phase of high protein concentration and a dilute phase of low protein concentration (10Banani S.F. Lee H.O. Hyman A.A. Rosen M.K. Biomolecular condensates: Organizers of cellular biochemistry.Nat. Rev. Mol. Cell Biol. 2017; 18: 285-298Google Scholar). Subsequent to initial PS, the dense phase may also undergo additional phase transitions that change its material properties (11Patel A. Lee H.O. Jawerth L. Maharana S. Jahnel M. Hein M.Y. Stoynov S. Mahamid J. Saha S. Franzmann T.M. Pozniakovski A. Poser I. Maghelli N. Royer L.A. Weigert M. et al.A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation.Cell. 2015; 162: 1066-1077Google Scholar, 12Molliex A. Temirov J. Lee J. Coughlin M. Kanagaraj A.P. Kim H.J. Mittag T. Taylor J.P. Phase separation by low complexity domains promotes stress granule assembly and drives pathological fibrillization.Cell. 2015; 163: 123-133Google Scholar). Consequently, condensates can exhibit material properties consistent with liquids, gels, or solids (13Boeynaems S. Alberti S. Fawzi N.L. Mittag T. Polymenidou M. Rousseau F. Schymkowitz J. Shorter J. Wolozin B. Van Den Bosch L. Tompa P. Fuxreiter M. Protein phase separation: A new phase in cell biology.Trends Cell Biol. 2018; 28: 420-435Google Scholar), and these properties can be influenced by many factors including protein sequence, protein concentration, the presence and concentrations of other molecules, and physical and chemical environment. One of the key features of proteins associated with PS is “multivalency,” which describes proteins with multiple binding sites for partner molecules. PS can occur in either single-component or multicomponent systems (14Dignon G.L. Best R.B. Mittal J. Biomolecular phase separation: From molecular driving forces to macroscopic properties.Annu. Rev. Phys. Chem. 2020; 71: 53-75Google Scholar, 15Ruff K.M. Dar F. Pappu R.V. Polyphasic linkage and the impact of ligand binding on the regulation of biomolecular condensates.Biophys. Rev. 2021; 2021302Google Scholar). In a single-component system, PS is driven by homotypic interactions (i.e., between two identical biopolymers), whereas co-PS in a multicomponent system is driven by heterotypic interactions (i.e., between different biopolymers) or a combination of homotypic and heterotypic interactions. While no single type of domain is required in multivalent proteins to observe PS, certain domains appear to be more common among proteins known to phase separate. For example, a number of phase separating proteins contain RNA-binding domains, intrinsically disordered regions (IDRs), oligomerization domains, and low-complexity domains. PS has gained recent attention in biology due to its connection with “biomolecular condensates” (10Banani S.F. Lee H.O. Hyman A.A. Rosen M.K. Biomolecular condensates: Organizers of cellular biochemistry.Nat. Rev. Mol. Cell Biol. 2017; 18: 285-298Google Scholar, 13Boeynaems S. Alberti S. Fawzi N.L. Mittag T. Polymenidou M. Rousseau F. Schymkowitz J. Shorter J. Wolozin B. Van Den Bosch L. Tompa P. Fuxreiter M. Protein phase separation: A new phase in cell biology.Trends Cell Biol. 2018; 28: 420-435Google Scholar, 16Hyman A.A. Weber C.A. Jülicher F. Liquid-liquid phase separation in biology.Annu. Rev. Cell Dev. Biol. 2014; 30: 39-58Google Scholar), which are membraneless organelles that are typically enriched in certain proteins and nucleic acids. Much like the dense phase observed in vitro, biomolecular condensates consist of a network of interactions between multivalent proteins and partner molecules (often nucleic acids, proteins, or other biopolymers). Many types of biomolecular condensates have been described, including (but not limited to) stress granules, P-bodies, nucleoli, nuclear speckles, germ granules, and Cajal bodies (10Banani S.F. Lee H.O. Hyman A.A. Rosen M.K. Biomolecular condensates: Organizers of cellular biochemistry.Nat. Rev. Mol. Cell Biol. 2017; 18: 285-298Google Scholar). Each type of biomolecular condensate is associated with distinct sets of constituent molecules, material properties, biological functions, stability, and regulation. Regardless of these differences, biomolecular condensation represents an elegant biological solution for organizing and concentrating groups of molecules in a regulatable and sensitive fashion. Given the prevalence, diversity, and importance of biomolecular condensates in eukaryotes, it is perhaps no surprise that some viruses are able to interact with and manipulate endogenous condensates or trigger the formation of entirely new viral condensates in host cells (17Gaete-Argel A. Márquez C.L. Barriga G.P. Soto-Rifo R. Valiente-Echeverría F. Strategies for success. Viral infections and membraneless organelles.Front. Cell. Infect. Microbiol. 2019; 9: 336Google Scholar, 18Etibor T.A. Yamauchi Y. Amorim M.J. Liquid biomolecular condensates and viral lifecycles: Review and perspectives.Viruses. 2021; 13: 366Google Scholar). Shortly after the emergence of SARS-CoV-2, we proposed that the SARS-CoV-2 N protein would undergo PS in vitro, and that similar biophysical behavior in vivo might mediate the formation of RNA–protein condensates during viral RNA packaging into new virions, or modulate host-cell condensates (namely, stress granules) via direct physical interaction (19Cascarina S.M. Ross E.D. A proposed role for the SARS-CoV-2 nucleocapsid protein in the formation and regulation of biomolecular condensates.FASEB J. 2020; 34: 9832-9842Google Scholar). In the ensuing months, many studies examining various aspects of the PS behavior of the SARS-CoV-2 N protein, including its role in viral RNA packaging, stress granule modulation, regulation of host-cell innate immune pathways, and regulation by host-cell kinases, were formally published (7Lu S. Ye Q. Singh D. Cao Y. Diedrich J.K. Yates J.R. Villa E. Cleveland D.W. Corbett K.D. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein.Nat. Commun. 2021; 12: 502Google Scholar, 20Zhao M. Yu Y. Sun L.-M. Xing J.-Q. Li T. Zhu Y. Wang M. Yu Y. Xue W. Xia T. Cai H. Han Q.-Y. Yin X. Li W.-H. Li A.-L. et al.GCG inhibits SARS-CoV-2 replication by disrupting the liquid phase condensation of its nucleocapsid protein.Nat. Commun. 2021; 12: 2114Google Scholar, 21Iserman C. Roden C.A. Boerneke M.A. Sealfon R.S.G. McLaughlin G.A. Jungreis I. Fritch E.J. Hou Y.J. Ekena J. Weidmann C.A. Theesfeld C.L. Kellis M. Troyanskaya O.G. Baric R.S. Sheahan T.P. et al.Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid.Mol. Cell. 2020; 80: 1078-1091.e6Google Scholar, 22Cubuk J. Alston J.J. Incicco J.J. Singh S. Stuchell-Brereton M.D. Ward M.D. Zimmerman M.I. Vithani N. Griffith D. Wagoner J.A. Bowman G.R. Hall K.B. Soranno A. Holehouse A.S. The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA.Nat. Commun. 2021; 12: 1936Google Scholar, 23Dang M. Li Y. Song J. ATP biphasically modulates LLPS of SARS-CoV-2 nucleocapsid protein and specifically binds its RNA-binding domain.Biochem. Biophys. Res. Commun. 2021; 541: 50-55Google Scholar, 24Zhao H. Wu D. Nguyen A. Li Y. Adão R.C. Valkov E. Patterson G.H. Piszczek G. Schuck P. Energetic and structural features of SARS-CoV-2 N-protein co-assemblies with nucleic acids.iScience. 2021; 24: 102523Google Scholar, 25Wang S. Dai T. Qin Z. Pan T. Chu F. Lou L. Zhang L. Yang B. Huang H. Lu H. Zhou F. Targeting liquid–liquid phase separation of SARS-CoV-2 nucleocapsid protein promotes innate antiviral immunity by elevating MAVS activity.Nat. Cell Biol. 2021; 23: 718-732Google Scholar, 26Huang W. Ju X. Tian M. Li X. Yu Y. Sun Q. Ding Q. Jia D. Molecular determinants for regulation of G3BP1/2 phase separation by the SARS-CoV-2 nucleocapsid protein.Cell Discov. 2021; 7: 69Google Scholar, 27Prakash Somasekharan S. Gleave M. SARS-CoV-2 nucleocapsid protein interacts with immunoregulators and stress granules and phase separates to form liquid droplets.FEBS Lett. 2021; 595: 2872-2896Google Scholar, 28Jack A. Ferro L.S. Trnka M.J. Wehri E. Nadgir A. Nguyenla X. Fox D. Costa K. Stanley S. Schaletzky J. Yildiz A. SARS-CoV-2 nucleocapsid protein forms condensates with viral genomic RNA.PLoS Biol. 2021; 19e3001425Google Scholar, 29Savastano A. Ibáñez de Opakua A. Rankovic M. Zweckstetter M. Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates.Nat. Commun. 2020; 11: 6041Google Scholar, 30Carlson C.R. Asfaha J.B. Ghent C.M. Howard C.J. Hartooni N. Safari M. Frankel A.D. Morgan D.O. Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions.Mol. Cell. 2020; 80: 1092-1103.e4Google Scholar, 31Luo L. Li Z. Zhao T. Ju X. Ma P. Jin B. Zhou Y. He S. Huang J. Xu X. Zou Y. Li P. Liang A. Liu J. Chi T. et al.SARS-CoV-2 nucleocapsid protein phase separates with G3BPs to disassemble stress granules and facilitate viral production.Sci. Bull. 2021; 66: 1194-1204Google Scholar, 32Perdikari T.M. Murthy A.C. Ryan V.H. Watters S. Naik M.T. Fawzi N.L. SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs.EMBO J. 2020; 39e106478Google Scholar, 33Wang J. Shi C. Xu Q. Yin H. SARS-CoV-2 nucleocapsid protein undergoes liquid–liquid phase separation into stress granules through its N-terminal intrinsically disordered region.Cell Discov. 2021; 7: 5Google Scholar, 34Wu Y. Ma L. Cai S. Zhuang Z. Zhao Z. Jin S. Xie W. Zhou L. Zhang L. Zhao J. Cui J. RNA-induced liquid phase separation of SARS-CoV-2 nucleocapsid protein facilitates NF-κB hyper-activation and inflammation.Signal Transduct. Target. Ther. 2021; 6: 167Google Scholar, 35Chen H. Cui Y. Han X. Hu W. Sun M. Zhang Y. Wang P.H. Song G. Chen W. Lou J. Liquid–liquid phase separation by SARS-CoV-2 nucleocapsid protein and RNA.Cell Res. 2020; 30: 1143-1145Google Scholar, 36Zhao D. Xu W. Zhang X. Wang X. Ge Y. Yuan E. Xiong Y. Wu S. Li S. Wu N. Tian T. Feng X. Shu H. Lang P. Li J. et al.Understanding the phase separation characteristics of nucleocapsid protein provides a new therapeutic opportunity against SARS-CoV-2.Protein Cell. 2021; 12: 734-740Google Scholar). Figure 1 highlights the factors affecting N-protein PS in vitro and the proposed functions of N-protein PS in vivo, each of which is discussed in the ensuing sections. Additionally, we compare the N-protein domains purported to be critical for PS and, more broadly, what can be learned from a “consensus” view resulting from many related studies published in a short timeframe. We would like to note that while we have done our best to faithfully interpret the available data, not all studies present rigorous quantification of PS and quantification methods differed between studies; therefore, our conclusions are based at least to some degree on our subjective interpretation. PS has often been associated with RNA-binding proteins containing prion-like or other low-complexity domains (37Fomicheva A. Ross E.D. From prions to stress granules: Defining the compositional features of prion-like domains that promote different types of assemblies.Int. J. Mol. Sci. 2021; 22: 1251Google Scholar, 38March Z.M. King O.D. Shorter J. Prion-like domains as epigenetic regulators, scaffolds for subcellular organization, and drivers of neurodegenerative disease.Brain Res. 2016; 1647: 9-18Google Scholar, 39Harrison A.F. Shorter J. RNA-binding proteins with prion-like domains in health and disease.Biochem. J. 2017; 474: 1417-1438Google Scholar). RNA itself is capable of undergoing PS (40Van Treeck B. Protter D.S.W. Matheny T. Khong A. Link C.D. Parker R. RNA self-assembly contributes to stress granule formation and defining the stress granule transcriptome.Proc. Natl. Acad. Sci. U. S. A. 2018; 115: 2734-2739Google Scholar), can often induce PS of specific proteins at lower protein concentrations (41Lin Y. Protter D.S.W. Rosen M.K. Parker R. formation and maturation of phase-separated liquid droplets by RNA-binding proteins.Mol. Cell. 2015; 60: 208-219Google Scholar), and can regulate the material properties of condensates in a variety of ways [reviewed in (42Roden C. Gladfelter A.S. RNA contributions to the form and function of biomolecular condensates.Nat. Rev. Mol. Cell Biol. 2020; 22: 183-195Google Scholar)]. The SARS-CoV-2 N protein contains two structured domains capable of binding RNA (43Zhou R. Zeng R. von Brunn A. Lei J. Structural characterization of the C-terminal domain of SARS-CoV-2 nucleocapsid protein.Mol. Biomed. 2020; 1: 2Google Scholar, 44Yang M. He S. Chen X. Huang Z. Zhou Z. Zhou Z. Chen Q. Chen S. Kang S. Structural insight into the SARS-CoV-2 nucleocapsid protein C-terminal domain reveals a novel recognition mechanism for viral transcriptional regulatory sequences.Front. Chem. 2021; 8: 624765Google Scholar, 45Peng Y. Du N. Lei Y. Dorje S. Qi J. Luo T. Gao G.F. Song H. Structures of the SARS-CoV-2 nucleocapsid and their perspectives for drug design.EMBO J. 2020; 39e105938Google Scholar, 46Kang S. Yang M. Hong Z. Zhang L. Huang Z. Chen X. He S. Zhou Z. Zhou Z. Chen Q. Yan Y. Zhang C. Shan H. Chen S. Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites.Acta Pharm. Sin. B. 2020; 10: 1228-1238Google Scholar, 47Wu C. Qavi A.J. Hachim A. Kavian N. Cole A.R. Moyle A.B. Wagner N.D. Sweeney-Gibbons J. Rohrs H.W. Gross M.L. Peiris J.S.M. Basler C.F. Farnsworth C.W. Valkenburg S.A. Amarasinghe G.K. et al.Characterization of SARS-CoV-2 nucleocapsid protein reveals multiple functional consequences of the C-terminal domain.iScience. 2021; 24: 102681Google Scholar), as well as multiple flanking IDRs that enhance RNA binding (44Yang M. He S. Chen X. Huang Z. Zhou Z. Zhou Z. Chen Q. Chen S. Kang S. Structural insight into the SARS-CoV-2 nucleocapsid protein C-terminal domain reveals a novel recognition mechanism for viral transcriptional regulatory sequences.Front. Chem. 2021; 8: 624765Google Scholar, 47Wu C. Qavi A.J. Hachim A. Kavian N. Cole A.R. Moyle A.B. Wagner N.D. Sweeney-Gibbons J. Rohrs H.W. Gross M.L. Peiris J.S.M. Basler C.F. Farnsworth C.W. Valkenburg S.A. Amarasinghe G.K. et al.Characterization of SARS-CoV-2 nucleocapsid protein reveals multiple functional consequences of the C-terminal domain.iScience. 2021; 24: 102681Google Scholar). While N protein alone typically exhibited weak or undetectable PS in the majority of studies (7Lu S. Ye Q. Singh D. Cao Y. Diedrich J.K. Yates J.R. Villa E. Cleveland D.W. Corbett K.D. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein.Nat. Commun. 2021; 12: 502Google Scholar, 20Zhao M. Yu Y. Sun L.-M. Xing J.-Q. Li T. Zhu Y. Wang M. Yu Y. Xue W. Xia T. Cai H. Han Q.-Y. Yin X. Li W.-H. Li A.-L. et al.GCG inhibits SARS-CoV-2 replication by disrupting the liquid phase condensation of its nucleocapsid protein.Nat. Commun. 2021; 12: 2114Google Scholar, 21Iserman C. Roden C.A. Boerneke M.A. Sealfon R.S.G. McLaughlin G.A. Jungreis I. Fritch E.J. Hou Y.J. Ekena J. Weidmann C.A. Theesfeld C.L. Kellis M. Troyanskaya O.G. Baric R.S. Sheahan T.P. et al.Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid.Mol. Cell. 2020; 80: 1078-1091.e6Google Scholar, 27Prakash Somasekharan S. Gleave M. SARS-CoV-2 nucleocapsid protein interacts with immunoregulators and stress granules and phase separates to form liquid droplets.FEBS Lett. 2021; 595: 2872-2896Google Scholar, 29Savastano A. Ibáñez de Opakua A. Rankovic M. Zweckstetter M. Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates.Nat. Commun. 2020; 11: 6041Google Scholar, 30Carlson C.R. Asfaha J.B. Ghent C.M. Howard C.J. Hartooni N. Safari M. Frankel A.D. Morgan D.O. Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions.Mol. Cell. 2020; 80: 1092-1103.e4Google Scholar, 31Luo L. Li Z. Zhao T. Ju X. Ma P. Jin B. Zhou Y. He S. Huang J. Xu X. Zou Y. Li P. Liang A. Liu J. Chi T. et al.SARS-CoV-2 nucleocapsid protein phase separates with G3BPs to disassemble stress granules and facilitate viral production.Sci. Bull. 2021; 66: 1194-1204Google Scholar, 32Perdikari T.M. Murthy A.C. Ryan V.H. Watters S. Naik M.T. Fawzi N.L. SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs.EMBO J. 2020; 39e106478Google Scholar, 33Wang J. Shi C. Xu Q. Yin H. SARS-CoV-2 nucleocapsid protein undergoes liquid–liquid phase separation into stress granules through its N-terminal intrinsically disordered region.Cell Discov. 2021; 7: 5Google Scholar), RNA almost universally induced PS of the SARS-CoV-2 N protein across all studies evaluated in depth (Fig. 1Ai). RNAs of varying lengths and sequences can induce N-protein PS to varying degrees, suggesting that this process is somewhat nonspecific in vitro (though the formation of N+RNA condensates in vivo may exhibit greater sequence specificity, as discussed in a later section). For studies that tested a wide range of RNA concentrations, exceedingly high amounts of RNA tended to inhibit PS (7Lu S. Ye Q. Singh D. Cao Y. Diedrich J.K. Yates J.R. Villa E. Cleveland D.W. Corbett K.D. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein.Nat. Commun. 2021; 12: 502Google Scholar, 22Cubuk J. Alston J.J. Incicco J.J. Singh S. Stuchell-Brereton M.D. Ward M.D. Zimmerman M.I. Vithani N. Griffith D. Wagoner J.A. Bowman G.R. Hall K.B. Soranno A. Holehouse A.S. The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA.Nat. Commun. 2021; 12: 1936Google Scholar, 28Jack A. Ferro L.S. Trnka M.J. Wehri E. Nadgir A. Nguyenla X. Fox D. Costa K. Stanley S. Schaletzky J. Yildiz A. SARS-CoV-2 nucleocapsid protein forms condensates with viral genomic RNA.PLoS Biol. 2021; 19e3001425Google Scholar, 32Perdikari T.M. Murthy A.C. Ryan V.H. Watters S. Naik M.T. Fawzi N.L. SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs.EMBO J. 2020; 39e106478Google Scholar), which is consistent with re-entrant phase behavior due to an imbalance in the stoichiometries of constituent molecules. Electrostatic forces were consistently implicated across studies in mediating or regulating RNA-dependent PS (Fig. 1Aii). PS of proteins is often sensitive to salt concentrations and types of salts used (48Alberti S. Gladfelter A. Mittag T. Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensates.Cell. 2019; 176: 419-434Google Scholar), which is generally presumed to reflect electrostatic driving forces for PS. Lower salt concentrations were typically associated with enhanced N-protein PS (20Zhao M. Yu Y. Sun L.-M. Xing J.-Q. Li T. Zhu Y. Wang M. Yu Y. Xue W. Xia T. Cai H. Han Q.-Y. Yin X. Li W.-H. Li A.-L. et al.GCG inhibits SARS-CoV-2 replication by disrupting the liquid phase condensation of its nucleocapsid protein.Nat. Commun. 2021; 12: 2114Google Scholar, 25Wang S. Dai T. Qin Z. Pan T. Chu F. Lou L. Zhang L. Yang B. Huang H. Lu H. Zhou F. Targeting liquid–liquid phase separation of SARS-CoV-2 nucleocapsid protein promotes innate antiviral immunity by elevating MAVS activity.Nat. Cell Biol. 2021; 23: 718-732Google Scholar, 28Jack A. Ferro L.S. Trnka M.J. Wehri E. Nadgir A. Nguyenla X. Fox D. Costa K. Stanley S. Schaletzky J. Yildiz A. SARS-CoV-2 nucleocapsid protein forms condensates with viral genomic RNA.PLoS Biol. 2021; 19e3001425Google Scholar, 29Savastano A. Ibáñez de Opakua A. Rankovic M. Zweckstetter M. Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates.Nat. Commun. 2020; 11: 6041Google Scholar, 32Perdikari T.M. Murthy A.C. Ryan V.H. Watters S. Naik M.T. Fawzi N.L. SARS-CoV-2 nucleocapsid protein phase-separates with RNA and with human hnRNPs.EMBO J. 2020; 39e106478Google Scholar, 33Wang J. Shi C. Xu Q. Yin H. SARS-CoV-2 nucleocapsid protein undergoes liquid–liquid phase separation into stress granules throu
DOI: 10.1074/jbc.273.6.3679
1998
Cited 338 times
Hop Modulates hsp70/hsp90 Interactions in Protein Folding
Hop is a 60-kDa protein characterized by its ability to bind the two chaperones, hsp70 and hsp90. We have tested the function of Hop using an assay for the refolding of denatured firefly luciferase. We show that Hop is involved in the process of refolding thermally denatured firefly luciferase in rabbit reticulocyte lysate. Hop also stimulates refolding by hsp70 and Ydj-1 in a purified refolding system. Hsp90 can also stimulate refolding, and optimal refolding is observed in the presence of both Hop and hsp90. Similar stimulation was observed when Hop was replaced by its yeast homolog Sti1. In assays of the binding of Hop to hsp70 and hsp90, Hop preferentially forms a complex with ADP-bound hsp70, and this process is unaffected by the presence of hsp90. Hop does not alter the ATPase activity or the rate of ADP dissociation of hsp70. Hop also appears to bind to the ADP-bound form of hsp90, blocking the ATP-dependent conversion of hsp90 to a form capable of interacting with p23. Conversely, once p23 is bound to hsp90, Hop binding is diminished. These results confirm that Hop provides a physical link between hsp70 and hsp90 and also indicate that Hop modulates the activities of both of these chaperone proteins. Hop is a 60-kDa protein characterized by its ability to bind the two chaperones, hsp70 and hsp90. We have tested the function of Hop using an assay for the refolding of denatured firefly luciferase. We show that Hop is involved in the process of refolding thermally denatured firefly luciferase in rabbit reticulocyte lysate. Hop also stimulates refolding by hsp70 and Ydj-1 in a purified refolding system. Hsp90 can also stimulate refolding, and optimal refolding is observed in the presence of both Hop and hsp90. Similar stimulation was observed when Hop was replaced by its yeast homolog Sti1. In assays of the binding of Hop to hsp70 and hsp90, Hop preferentially forms a complex with ADP-bound hsp70, and this process is unaffected by the presence of hsp90. Hop does not alter the ATPase activity or the rate of ADP dissociation of hsp70. Hop also appears to bind to the ADP-bound form of hsp90, blocking the ATP-dependent conversion of hsp90 to a form capable of interacting with p23. Conversely, once p23 is bound to hsp90, Hop binding is diminished. These results confirm that Hop provides a physical link between hsp70 and hsp90 and also indicate that Hop modulates the activities of both of these chaperone proteins. The molecular chaperones hsp70 1The abbreviations used are: hsp, heat shock protein; Hop, hsp organizing protein; Hip, hsp70 interacting protein; DTT, dithiothreitol; CMLA, carboxymethylated α-lactalbumin; PAGE, polyacrylamide gel electrophoresis; FPLC, fast protein liquid chromatography; Tricine, N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]glycine. and hsp90 are two of the most prominent heat shock proteins in the eukaryotic cytosol. Of the two, hsp70 is the more widely studied and has been extensively characterized in bacteria (DnaK), in yeast (Ssa and Ssb), and in several compartments of higher eukaryotes. hsp70s are composed of two domains as follows: a 45-kDa, amino-terminal nucleotide-binding domain and a 25-kDa, carboxyl-terminal peptide-binding domain. Through a cycle of ATP binding, hydrolysis, and nucleotide exchange, denatured proteins are alternately bound and released to effect protein folding (1Hartl F.U. Nature. 1996; 381: 571-580Crossref PubMed Scopus (3121) Google Scholar, 2Rassow J. von Ahsen O. Bomer U. Pfanner N. Trends Cell Biol. 1997; 7: 129-132Abstract Full Text PDF PubMed Scopus (119) Google Scholar). Substrates bind transiently to the ATP-bound form of hsp70, but when ATP is hydrolyzed the binding is stabilized (3Greene L.E. Zinners R. Naficy S. Eisenberg E. J. Biol. Chem. 1995; 270: 2967-2973Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholar). Release of the substrate occurs when ADP is exchanged for ATP (4Palleros D.R. Reid K.L. Shi L. Welch W.J. Fink A.L. Nature. 1993; 365: 664-666Crossref PubMed Scopus (347) Google Scholar). hsp70 function is modulated by members of the hsp40 family in higher eukaryotes, Ydj1 in yeast, and DnaJ in bacteria. These proteins are known to stimulate the ATPase activity of their respective hsp70s (5Cyr D.M. Lu X. Douglas M.G. J. Biol. Chem. 1992; 267: 20927-20931Abstract Full Text PDF PubMed Google Scholar, 6Liberek K. Marszalek J. Ang D. Georgopoulos C. Zylicz M. Proc. Natl. Acad. Sci. U. S. A. 1991; 88: 2874-2878Crossref PubMed Scopus (690) Google Scholar) and are required for hsp70-mediated refolding of denatured substrate proteins (7Freeman B.C. Morimoto R.I. EMBO J. 1996; 15: 2969-2979Crossref PubMed Scopus (381) Google Scholar, 8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar). In addition, some studies suggest that hsp40s may act independently to recognize and bind unfolded polypeptides to prevent aggregation and target them to hsp70 (9Cyr D.M. FEBS Lett. 1995; 359: 129-132Crossref PubMed Scopus (114) Google Scholar, 10Schroder H. Langer T. Hartl F.U. Bukau B. EMBO J. 1993; 12: 4137-4144Crossref PubMed Scopus (500) Google Scholar, 11Szabo A. Korszun R. Hartl F.U. Flanagan J. EMBO J. 1996; 15: 408-417Crossref PubMed Scopus (275) Google Scholar). In bacteria, DnaK function is also dependent on a nucleotide exchange factor, GrpE, but no eukaryotic cytoplasmic homolog has been identified (6Liberek K. Marszalek J. Ang D. Georgopoulos C. Zylicz M. Proc. Natl. Acad. Sci. U. S. A. 1991; 88: 2874-2878Crossref PubMed Scopus (690) Google Scholar, 12Szabo A. Langer T. Schroder H. Flanagan J. Bukau B. Hartl F.U. Proc. Natl. Acad. Sci. U. S. A. 1994; 91: 10345-10349Crossref PubMed Scopus (445) Google Scholar). hsp90 has also been shown to play a role in protein folding and in the functional maturation of a number of kinases and receptors. hsp90 is often studied as a chaperone necessary for the maturation of progesterone and glucocorticoid receptors, and in these systems an intermediate and a mature state during complex assembly have been described (13Pratt, W. B., and Toft, D. O. Endocr. Rev., 18, 306–360.Google Scholar, 14Smith D.F. Whitesell L. Nair S.C. Chen S. Prapapanich V. Rimerman R.A. Mol. Cell. Biol. 1995; 15: 6804-6812Crossref PubMed Scopus (272) Google Scholar). The intermediate complex contains hsp70 and Hop (also called p60), the human homolog of the yeast stress-induced protein STI1, plus some hsp90 and Hip (also called p48) (15Chen S. Prapapanich V. Rimerman R.A. Honore B. Smith D.F. Mol. Endocrinol. 1996; 10: 682-693Crossref PubMed Google Scholar, 16Smith D.F. Sullivan W.P. Marion T.N. Zaitsu K. Madden B. McCormick D.J. Toft D.O. Mol. Cell. Biol. 1993; 13: 869-876Crossref PubMed Scopus (247) Google Scholar). This progresses to a complex containing less hsp70 and Hop, more hsp90, and the hsp90-binding proteins p23 and one of three large immunophilins (FKBP51, FKBP52, or cyclophilin 40). The appearance of this complex correlates with the final step of receptor maturation (17Johnson J.L. Corbiser R. Stensgard B. Toft D.O. J. Steroid Biochem. Mol. Biol. 1996; 56: 31-37Crossref PubMed Scopus (66) Google Scholar, 18Johnson J.L. Toft D.O. J. Biol. Chem. 1994; 269: 24989-24993Abstract Full Text PDF PubMed Google Scholar, 19Johnson J.L. Toft D.O. Mol. Endocrinol. 1995; 9: 670-678Crossref PubMed Scopus (211) Google Scholar). The mechanistic details of hsp90 function are still unclear. However, it does appear that similar hsp90 complexes are utilized in the maturation of a number of different target proteins (20Chang H.J. Nathan D.F. Lindquist S. Mol. Cell. Biol. 1997; 17: 318-325Crossref PubMed Scopus (193) Google Scholar, 21Nair S.C. Toran E.J. Rimerman R.A. Hjermstad S. Smithgall T.E. Smith D.F. Cell Stress & Chaperones. 1996; 1: 237-250Crossref PubMed Scopus (199) Google Scholar). The functions of the accessory proteins that bind to hsp90 are only beginning to be understood. Hop is an abundant, stress-induced protein (22Honoré B. Leffers H. Madsen P. Rasmussen H.H. Vandekerckhove J. Celis J.E. J. Biol. Chem. 1992; 267: 8485-8491Abstract Full Text PDF PubMed Google Scholar) associated with hsp90 and hsp70 in dynamic complexes in reticulocyte lysate and found as a component of intermediate steroid receptor complexes (15Chen S. Prapapanich V. Rimerman R.A. Honore B. Smith D.F. Mol. Endocrinol. 1996; 10: 682-693Crossref PubMed Google Scholar, 16Smith D.F. Sullivan W.P. Marion T.N. Zaitsu K. Madden B. McCormick D.J. Toft D.O. Mol. Cell. Biol. 1993; 13: 869-876Crossref PubMed Scopus (247) Google Scholar). Hop has been shown to be necessary for the in vitro assembly of steroid receptors with hsp90 (15Chen S. Prapapanich V. Rimerman R.A. Honore B. Smith D.F. Mol. Endocrinol. 1996; 10: 682-693Crossref PubMed Google Scholar, 23Dittmar K.D. Hutchison K.A. Owens-Grillo J.K. Pratt W.B. J. Biol. Chem. 1996; 271: 12833-12839Abstract Full Text Full Text PDF PubMed Scopus (149) Google Scholar). Its yeast homolog, STI1 (24Nicolet C.M. Craig E.A. Mol. Cell. Biol. 1989; 9: 3638-3646Crossref PubMed Scopus (206) Google Scholar), is almost entirely complexed with hsp90 (25Chang H.C. Lindquist S. J. Biol. Chem. 1994; 269: 24983-24988Abstract Full Text PDF PubMed Google Scholar). Therefore, it seems likely that this protein functions along with hsp90. Since hsp70 and hsp90 are not associated with one another in its absence, yet each associates individually or in a ternary complex with Hop, it appears that Hop is responsible for the organization of this complex (15Chen S. Prapapanich V. Rimerman R.A. Honore B. Smith D.F. Mol. Endocrinol. 1996; 10: 682-693Crossref PubMed Google Scholar, 17Johnson J.L. Corbiser R. Stensgard B. Toft D.O. J. Steroid Biochem. Mol. Biol. 1996; 56: 31-37Crossref PubMed Scopus (66) Google Scholar, 26Lassle M. Blatch G.L. Kundra V. Takatori T. Zetter B.R. J. Biol. Chem. 1997; 272: 1876-1884Abstract Full Text Full Text PDF PubMed Scopus (143) Google Scholar). Hop has recently been reported to stimulate the ATPase activity of hsp70 and to increase hsp70's affinity for ATP, thus serving the role of nucleotide exchange factor for hsp70, analogous to the role of GrpE in prokaryotic systems (27Gross M. Hessefort S. J. Biol. Chem. 1996; 271: 16833-16841Abstract Full Text Full Text PDF PubMed Scopus (48) Google Scholar). In assays of passive chaperoning activity, Hop does not have any productive interactions with the denatured substrates citrate synthase or β-galactosidase (28Bose S. Weikl W. Bugl H. Buchner J. Science. 1996; 274: 1715-1717Crossref PubMed Scopus (319) Google Scholar, 29Freeman B.C. Toft D.O. Morimoto R.I. Science. 1996; 274: 1718-1720Crossref PubMed Scopus (289) Google Scholar). However, one recent study suggests that Hop is associated with luciferase during the refolding process in reticulocyte lysate (30Thulasiraman V. Matts R.L. Biochemistry. 1996; 35: 13443-13450Crossref PubMed Scopus (65) Google Scholar). An assay for the chaperone-mediated refolding of thermally denatured firefly luciferase has been described previously. In this system, the chaperones hsp70 and Ydj-1 are absolute requirements for the refolding process, and hsp90 can enhance refolding under some conditions (8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar). We used this assay to investigate the role of Hop in protein refolding. Hop is involved in refolding in rabbit reticulocyte lysate and can significantly stimulate refolding using purified chaperones. Hop binds to hsp70 and hsp90 in a nucleotide-dependent manner and may also modulate the functions of these two chaperones. Mouse monoclonal antibody F5 was prepared against avian p60/Hop as described previously (16Smith D.F. Sullivan W.P. Marion T.N. Zaitsu K. Madden B. McCormick D.J. Toft D.O. Mol. Cell. Biol. 1993; 13: 869-876Crossref PubMed Scopus (247) Google Scholar). F5 antibody cross-reacts with human Hop. Mouse monoclonal antibody 4F3 was prepared against chicken hsp90 and does not cross-react with rabbit hsp90 (31Smith D.F. Mol. Endocrinol. 1993; 7: 1418-1429Crossref PubMed Scopus (251) Google Scholar). Mouse monoclonal antibody JJ3 was prepared against human p23 as described previously (32Johnson J.L. Beito T.G. Krco C.J. Toft D.O. Mol. Cell. Biol. 1994; 14: 1956-1963Crossref PubMed Scopus (181) Google Scholar). Mouse monoclonal antibody BB70 was prepared against avian hsp70 complexed with hsp90 as described previously (16Smith D.F. Sullivan W.P. Marion T.N. Zaitsu K. Madden B. McCormick D.J. Toft D.O. Mol. Cell. Biol. 1993; 13: 869-876Crossref PubMed Scopus (247) Google Scholar). BB70 cross-reacts with both free and complexed human hsp70. A rabbit antiserum against Hdj-1 was supplied by W. J. Hansen and W. J. Welch and has been described previously (8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar). This antiserum cross-reacts with several J proteins important to the refolding process. Antibody ST2 is a mouse monoclonal IgG prepared against STI1. Purified STI1 (see below) was used as the antigen and for screening by enzyme-linked immunosorbent assay and Western blotting. Balb/c mice were injected subcutaneously with 100 μg of antigen in Freund's incomplete adjuvant. Splenocytes were fused with myeloma cell line Sp2/0-Ag 14, and hybridomas were selected and screened by conventional methods. hsp90 was prepared by the overexpression of human hsp90β in Sf9 cells using the system of Alnemri and Litwack (33Alnemri E.S. Litwack G. Biochemistry. 1993; 32: 5387-5393Crossref PubMed Scopus (37) Google Scholar). The purification was as described previously (34Sullivan W. Stensgard B. Caucutt C. Bartha B. McMahon N. Alnemri E.S. Litwack G. Toft D. J. Biol. Chem. 1997; 272: 8007-8012Abstract Full Text Full Text PDF PubMed Scopus (226) Google Scholar). Cell lysates were fractionated by DEAE-cellulose column chromatography, followed by heparin-agarose column chromatography, and then Mono Q FPLC. The preparation was greater than 99% pure as assessed by densitometry of SDS-PAGE gels. Western blot analysis using rabbit antiserum against Hdj-1 showed no contamination by J proteins. Protein concentration was determined by amino acid analysis. hsp70 was prepared by the overexpression of human hsp70 in Sf9 cells using the system of Alnemri and Litwack (33Alnemri E.S. Litwack G. Biochemistry. 1993; 32: 5387-5393Crossref PubMed Scopus (37) Google Scholar). The purification was as described previously for avian hsp70 (8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar). Cell lysates were fractionated by DEAE-cellulose column chromatography followed by ATP-agarose column chromatography. This was precipitated using ammonium sulfate (75% saturation), and the redissolved hsp70 was fractionated by 16/60 Superdex 200 FPLC. Only the monomer peak of hsp70 was used. The preparation was approximately 97% pure as assessed by densitometry of SDS-PAGE gels. Western blot analysis using rabbit antiserum against Hdj-1 showed no contamination by J proteins. Protein concentration was determined by amino acid analysis. Human Hop expressed in bacteria was prepared essentially as described previously (35Schumacher R.J. Hurst R. Sullivan W.P. McMahon N.J. Toft D.O. Matts R.L. J. Biol. Chem. 1994; 269: 9493-9499Abstract Full Text PDF PubMed Google Scholar). Bacterial lysates were fractionated by DEAE-cellulose chromatography followed by hydroxylapatite column chromatography. Additional purification was achieved by fractionating the pool from hydroxylapatite on a Mono Q FPLC column (10/10, Pharmacia Biotech Inc.) that was eluted with a linear gradient of 0–0.5 m KCl. The fractions containing Hop were pooled, dialyzed into 10 mm Tris-HCl, 1 mm DTT, and 1 mm EDTA, pH 7.5, and stored at −70 °C. The preparation was approximately 94% pure as assessed by densitometry of SDS-PAGE gels. Western blot analysis using rabbit antiserum against Hdj-1 showed no contamination by J proteins. Protein concentration was determined by amino acid analysis. A bacterial expression system for Ydj1p was generously supplied by Dr. Avrom Caplan and has been described previously (36Caplan A.J. Tsai J. Casey P.J. Douglas M.G. J. Biol. Chem. 1992; 267: 18890-18895Abstract Full Text PDF PubMed Google Scholar). Bacterial lysates were fractionated by DEAE-cellulose column chromatography followed by hydroxylapatite column chromatography. The preparation was approximately 80% pure as assessed by densitometry of SDS-PAGE gels. Protein concentration was determined by amino acid analysis. The cDNA for STI1 ofSaccharomyces cerevisiae was obtained from Elizabeth Craig (24Nicolet C.M. Craig E.A. Mol. Cell. Biol. 1989; 9: 3638-3646Crossref PubMed Scopus (206) Google Scholar). This was placed in a pET-11 vector and expressed inEscherichia coli. Cell lysates were prepared by sonication in 3 volumes of 10 mm Tris-HCl, 1 mm EDTA, 10 mm monothioglycerol, pH 7.5, and the protease inhibitors pepstatin, 2 μg/ml, leupeptin, 2 μg/ml, and 4-(2-aminoethyl)benzenesulfonyl fluoride, 1 mm. STI1 was extracted as a soluble protein, and after centrifugation, it was loaded on a DEAE-cellulose column that was washed with 10 mmTris-HCl, 1 mm EDTA, 10 mm thioglycerol, pH 7.5, and eluted with a 0–0.4 m KCl gradient. Fractions containing STI1 were pooled and loaded on a hydroxylapatite column that was washed with 10 mm potassium phosphate, 10 mm thioglycerol, and 0.1 m KCl, pH 7.4, and eluted with a gradient of 10–400 mm potassium phosphate. Fractions containing STI1 were pooled and loaded onto a 16/60 Superdex 200 sizing column and eluted with 10 mm Tris-HCl, 1 mm DTT, and 0.25 m KCl, pH 7.5. The fractions containing STI1 were pooled and dialyzed into 10 mmTris-HCl, 1 mm DTT and 1 mm EDTA, pH 7.5, and stored at −70 °C. The preparation was greater than 98% pure as assessed by densitometry of SDS-PAGE gels. Protein concentration was determined by amino acid analysis. The bacterial expression and purification of human p23 has been described (18Johnson J.L. Toft D.O. J. Biol. Chem. 1994; 269: 24989-24993Abstract Full Text PDF PubMed Google Scholar). The soluble fraction of bacterial lysate was fractionated by DEAE-cellulose column chromatography followed by phenyl-Sepharose (hp1660) FPLC, dialyzed into 10 mm Tris-HCl, 1 mm DTT, and 1 mm EDTA, pH 7.5, and stored at −70 °C. The preparation was greater than 99% pure as assessed by densitometry of SDS-PAGE gels. Protein concentration was determined by amino acid analysis. Tris buffer (TB) was as follows: 10 mmTris-HCl, pH 7.5, 3 mm MgCl2, 50 mmKCl, and 2 mm DTT. Stability buffer (SB) was as follows: 25 mm Tricine-HCl, pH 7.8, 8 mm MgSO4, 0.1 mm EDTA, 10 mg/ml bovine serum albumin, 10% glycerol, and 0.25% Triton X-100. Luciferase refolding assays were performed as described previously (8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar). Firefly luciferase, 100 nm in SB, was heat-denatured at 40 °C to ∼0.2% of its original activity. This was diluted 10-fold into reticulocyte lysate or a refolding mixture containing purified chaperone proteins, 2 mm ATP, and an ATP-regenerating system in TB. The refolding mixture was incubated at 25 °C to promote folding, and at the indicated times following addition of denatured luciferase, aliquots were removed, and luciferase activity was measured in a luminometer. The luciferase activities were expressed as a percent of control samples of the same luciferase concentration that was not denatured. The hsp70-catalyzed hydrolysis of ATP was measured essentially as described previously (38Sadis S. Hightower L.E. Biochemistry. 1992; 31: 9406-9412Crossref PubMed Scopus (116) Google Scholar). Assays containing 2 μm hsp70, 20 μm ATP, and 5 μCi of [γ-32P]ATP in 50 μl were incubated at 37 °C for 30 min. A 10-μl aliquot was removed before the start of each incubation, and additional aliquots were removed after 10, 20, and 30 min of incubation and added to 500 μl of an acidified charcoal suspension. The charcoal, along with the nucleotides, was pelleted, and the amount of free phosphate in the supernatant was measured by liquid scintillation counting. The binding of Hop or STI1 to hsp70 and hsp90 was measured by combining the appropriate proteins (∼10 μg each) in 200 μl of buffer containing 10 mmTris-HCl, 5 mm MgCl2, 50 mm KCl, and 1 mm DTT, pH 7.5. In some cases, ATP or ADP was included (see figure legends). The samples were incubated for 30 min at 30 °C, chilled on ice, and combined with 25-μl pellets of protein A-Sepharose containing either antibody F5 against Hop or ST2 against STI1. After incubation for 1 h in ice, the resin pellets were washed, extracted into SDS sample buffer, and proteins were resolved by SDS-PAGE as described previously (37Laemmli U.K. Nature. 1970; 227: 680-685Crossref PubMed Scopus (207231) Google Scholar). Samples containing 1 μm hsp70, ∼0.2 μCi of [α32P]ATP, and 1 μm total ATP in 25 μl of TB were incubated with or without 250 nm Hop at 37 °C for 30 min to allow hsp70 to bind and hydrolyze the ATP. These were cooled on ice and then diluted 2-fold into TB containing some combination of 1 mm free ATP and/or 0.1 mg/ml bovine carboxymethylated-α-lactalbumin (CMLA) and incubated again at 37 °C to allow release of the labeled nucleotide from hsp70. A 9-μl aliquot was removed prior to the start of each incubation, and additional aliquots were removed after 1, 2, 4, and 8 min of incubation and filtered through nitrocellulose filters. Each filter was washed twice with 1 ml of TB, and the amount of bound nucleotide on each filter was determined by liquid scintillation counting. The abilities of Hop and p23 to bind to hsp90 were measured by combining 220 nm hsp90 with 10 μm ATP or 10 μm ADP in 200 μl of 20 mm Na2MoO4, 0.01% Nonidet P-40, 5 mm MgCl2, and 10 mm Tris-HCl, pH 7.5. Samples were incubated for 30 min at 30 °C and then chilled and incubated an additional 10 min on ice. Hop and/or p23 were added to the samples either before or after the 30 °C incubation. Samples were combined with 25-μl pellets of protein A-Sepharose conjugated with either antibody JJ3 against p23 or F5 against Hop and incubated for 90 min on ice. The resin pellets were washed 4 times with TB, extracted into SDS sample buffer, and proteins were resolved by SDS-PAGE as described previously (37Laemmli U.K. Nature. 1970; 227: 680-685Crossref PubMed Scopus (207231) Google Scholar). To determine whether Hop is involved in the process of refolding firefly luciferase in rabbit reticulocyte lysate, we added thermally denatured luciferase to a refolding mixture containing reticulocyte lysate dialyzed in TB, with ATP, and an ATP-regenerating system and incubated the reaction at 25 °C to promote refolding. We inhibited Hop's function in reticulocyte lysate using monoclonal antibody F5 (15Chen S. Prapapanich V. Rimerman R.A. Honore B. Smith D.F. Mol. Endocrinol. 1996; 10: 682-693Crossref PubMed Google Scholar), then used this lysate in refolding experiments (not shown). As increasing amounts of F5 antibody are added to the lysate used to refold denatured firefly luciferase, the initial rate of the refolding reaction (measured after 15 min) is progressively reduced by as much as 25%. As the time course of refolding proceeds, the inhibition caused by F5 antibody gradually decreases until there is about a 10% difference in the extent of refolding after 60 min. We also used lysates treated with high salt (0.4 m KCl) followed by immune depletion of Hop for luciferase refolding (Fig. 1, A and B). High salt treatment of reticulocyte lysate followed by buffer exchange decreases the initial rate of refolding by about 40%. Immune depletion of Hop from high salt-treated lysate further reduces the initial rate of luciferase refolding compared with control preparations by up to 25% (Fig. 1, A and B). After two rounds of depletion, Hop was undetectable by Western blot analysis in the reticulocyte lysate preparations. Supplementing Hop-depleted lysate with purified human Hop at 0.1, 0.3, or 1 μm restores the refolding ability of the lysate (Fig. 1 A), but addition of purified Hop to control lysate that has not been depleted of Hop causes a small loss in refolding ability (Fig. 1 B), most likely the result of exceeding the optimal concentration of Hop. After 80 min of refolding time, the extent of refolding in control preparations is not significantly different than in untreated lysate. However, considerably less luciferase is refolded during 80 min in Hop-depleted preparations (Fig. 1 B). When Hop is depleted from reticulocyte lysate at normal salt concentrations, the rate of refolding is reduced by about 13 (not shown). In this case, the addition of purified Hop only partially restores the lost activity. The most probable explanation for this result is that other chaperones such as hsp70 and hsp90 which are known to associate with Hop and play a role in refolding are being coprecipitated and are no longer available for refolding. These results show that the refolding of luciferase in reticulocyte lysate is influenced by Hop, but Hop is clearly not an essential component of the refolding machinery. We used purified chaperone proteins in the luciferase refolding assay to characterize further Hop's role in refolding. Thermally denatured luciferase was added to a refolding mixture containing purified chaperone proteins, ATP, and an ATP-regenerating system and allowed to refold at 25 °C; the results are shown in Fig. 2 A. In the absence of either hsp70 or Ydj, no individual chaperone or combination of chaperones is able to mediate luciferase refolding. When hsp70 and Ydj are both present in the refolding reaction, a significant amount of luciferase refolding occurs, as described previously (8Schumacher R.J. Hansen W.J. Freeman B.C. Alnemri E. Litwack G. Toft D.O. Biochemistry. 1996; 35: 14889-14898Crossref PubMed Scopus (144) Google Scholar), and this can be further stimulated by the addition of either Hop or hsp90. We see a dramatic stimulation in the rate and extent of luciferase refolding when both Hop and hsp90 are added to hsp70 and Ydj. This effect is most pronounced during the early stages of refolding (30 min) where greater than 10-fold stimulation is observed compared with a 2–3-fold stimulation for Hop or hsp90 alone. Identical results are obtained when STI1 is substituted for Hop in this assay (Fig. 2 B). The optimal concentration of Hop for a particular refolding reaction is in large part determined by hsp90. In the absence of hsp90, the optimal concentration of Hop in refolding reactions is greater than 100 nm (Fig. 3 A). When hsp90 is present, additional stimulation of refolding occurs along with a decrease in the optimal concentration for Hop to a range of 10–100 nm. This 10-fold decrease in the most effective concentration makes Hop optimum for refolding at a concentration similar to that of the substrate luciferase (10 nm). STI1 showed a very similar pattern of concentration dependence with a higher concentration (1 μm) being most effective in the absence of hsp90 and low to moderate concentrations (10–100 nm) being more effective when hsp90 is present (not shown). Hop and hsp90 also have effects on the concentration of hsp70 required for the refolding process as shown in Fig. 3 B. The optimal concentration range for hsp70 is quite narrow since excess hsp70 is very inhibitory. hsp70 is most effective at a concentration of approximately 350 nm for refolding in the presence of 80 nm Ydj. Addition of either Hop or hsp90 causes a slight increase in the requirement for hsp70 in addition to providing a stimulation to the refolding process at most hsp70 concentrations. When Hop and hsp90 are added together to the refolding mixture, the effective concentration range for hsp70 is shifted to a higher level and dramatically broadened, yielding a highly effective refolding mixture at hsp70 concentrations from ∼350 nm up to ∼4 μm. We were interested in determining Hop's mechanism of action in the luciferase refolding assay, particularly those assays in which hsp90 was absent. Gross and Hessefort (27Gross M. Hessefort S. J. Biol. Chem. 1996; 271: 16833-16841Abstract Full Text Full Text PDF PubMed Scopus (48) Google Scholar) have recently isolated and characterized a Hop homolog from rabbit reticulocyte lysate which they call RF-hsp70. They found that this protein stimulated the ATPase activity and nucleotide dissociation rate of hsp70. We wanted to confirm these activities using highly purified human Hop. Fig. 4 shows the results of an ATPase assay in which the hydrolysis of ATP by hsp70 is determined by measuring free phosphate release over a time course in the presence or absence of Hop. Ydj, which is known to stimulate the ATPase activity of hsp70 (5Cyr D.M. Lu X. Douglas M.G. J. Biol. Chem. 1992; 267: 20927-20931Abstract Full Text PDF PubMed Google Scholar, 6Liberek K. Marszalek J. Ang D. Georgopoulos C. Zylicz M. Proc. Natl. Acad. Sci. U. S. A. 1991; 88: 2874-2878Crossref PubMed Scopus (690) Google Scholar), was used as the positive control. ATP hydrolysis is clearly enhanced in the presence of either 25 or 250 nm Ydj, but the presence of 0.22 or 2.2 μm Hop did not alter hsp70's ATPase activity either in the absence or presence of Ydj. In the absence of hsp70, no significant hydrolysis of ATP was catalyzed by Ydj or Hop. Consistent with this result, we have found that Hop binds preferentially to the ADP-bound form of hsp70 (see below). A factor that stimulates ATP hydrolysis would be expected to associate with the ATP-bound form of the protein to then effect hydrolysis.
DOI: 10.1074/jbc.r500022200
2006
Cited 241 times
Intrinsic Protein Disorder, Amino Acid Composition, and Histone Terminal Domains
Core and linker histones are the most abundant protein components of chromatin. Even though they lack intrinsic structure, the N-terminal “tail” domains (NTDs) of the core histones and the C-terminal tail domain (CTD) of linker histones bind to many different macromolecular partners while functioning in chromatin. Here we discuss the underlying physicochemical basis for how the histone terminal domains can be disordered and yet specifically recognize and interact with different macromolecules. The relationship between intrinsic disorder and amino acid composition is emphasized. We also discuss the potential structural consequences of acetylation and methylation of lysine residues embedded in intrinsically disordered histone tail domains. Core and linker histones are the most abundant protein components of chromatin. Even though they lack intrinsic structure, the N-terminal “tail” domains (NTDs) of the core histones and the C-terminal tail domain (CTD) of linker histones bind to many different macromolecular partners while functioning in chromatin. Here we discuss the underlying physicochemical basis for how the histone terminal domains can be disordered and yet specifically recognize and interact with different macromolecules. The relationship between intrinsic disorder and amino acid composition is emphasized. We also discuss the potential structural consequences of acetylation and methylation of lysine residues embedded in intrinsically disordered histone tail domains. The core (H2A, H2B, H3, H4) and linker (H1 family) histones make up the fundamental protein components of chromatin fibers (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar). The N-terminal “tail” domains (NTDs) 2The abbreviations used are: NTD, N-terminal “tail” domain; CTD, C-terminal tail domain; IDP, intrinsically disordered protein. of the core histones and the C-terminal tail domain (CTD) of linker histones are intrinsically disordered, yet they are able to bind to many different macromolecular partners in chromatin. For example, the histone H3 and H4 NTDs interact with sites on other nucleosomes during chromatin condensation (3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar) and bind to proteins such as Sir3p (5Hecht A. Laroche T. Strahl-Bolsinger S. Gasser S.M. Grunstein M. Cell. 1995; 80: 583-592Abstract Full Text PDF PubMed Scopus (694) Google Scholar) and p300 (6An W. Roeder R.G. J. Biol. Chem. 2003; 278: 1504-1510Abstract Full Text Full Text PDF PubMed Scopus (23) Google Scholar). The H1 CTD interacts with linker DNA in a chromatin fiber (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar) and also binds to proteins such as DFF/40CAD (7Widlak P. Kalinowska M. Parseghian M.H. Lu X. Hansen J.C. Garrard W.T. Biochemistry. 2005; 44: 7871-7878Crossref PubMed Scopus (54) Google Scholar). This article focuses on the roles of intrinsic protein disorder in histone function. We highlight recent findings indicating that amino acid composition is the key determinant of molecular recognition by the histone tail domains and other intrinsically disordered protein regions. We also discuss how acetylation and methylation of lysine residues may modulate macromolecular interactions by altering the local physicochemical properties of intrinsically disordered histone domains. Proteins (or sizeable regions of proteins) that lack a well defined conformation under native conditions are referred to as “intrinsically disordered.” Many intrinsically disordered proteins (IDPs) are functional, adopting a well defined conformation upon interacting with a target molecule. Thus, the principle that protein function requires a well defined conformation must be modified; an isolated protein need not have a unique conformation, but the protein-target complex must. A corollary is that if the protein in question interacts with more than one target, it may adopt a corresponding number of different conformations. One of the more surprising aspects of IDPs is their ubiquity, especially in eukaryotes. The most rigorous analysis to date predicts that long IDP regions are found in an average of 33% of eukaryotic proteins but in only 2% of archaeal and 4% of eubacterial proteins (8Ward J.J. Sodhi J.S. McGuffin L.J. Buxton B.F. Jones D.T. J. Mol. Biol. 2004; 337: 635-645Crossref PubMed Scopus (1613) Google Scholar). Conformational adaptability is generally considered to be one of the driving forces for the evolution of IDPs. IDPs have several features that distinguish them from classical globular proteins. Experimentally, IDPs are recognized by a far-UV CD spectrum characteristic of unordered proteins: sharp peaks in NMR, low dispersion of chemical shifts, negative 1H-15N nuclear Overhauser effects, a radius of gyration or hydrodynamic radius comparable with that of the protein in concentrated urea or guanidinium chloride, and a marked susceptibility to proteases (9Dunker A.K. Lawson J.D. Brown C.J. Wlliams R.M. Romero P. Oh J.S. Oldfield C.J. Campen A.M. Ratliff C.M. Hipps K.W. Ausio J. Nissen M.S. Reeves R. Kang C. Kissinger C.R. Bailey R.W. Griswold M.D. Chiu W. Garner E.C. Obradovic Z. J. Mol. Graph. Model. 2001; 19: 26-59Crossref PubMed Scopus (1853) Google Scholar, 10Uversky V.N. Protein Sci. 2002; 11: 739-756Crossref PubMed Scopus (1514) Google Scholar). The absence of order in a crystal structure often is indicative of an intrinsically disordered domain as well. IDPs can be predicted from amino acid sequence data with good accuracy. In one study of more than 900 nonhomologous proteins, predictions of disordered regions more than 40 residues in length gave less than 6% false positives (11Dunker A.K. Brown C.J. Lawson J.D. Iakoucheva L.M. Obradovic Z. Biochemistry. 2002; 41: 6573-6582Crossref PubMed Scopus (1491) Google Scholar). Predictive algorithms score sequences according to the flexibility, hydropathy, charge, and other physicochemical properties of the amino acid residues (12Uversky V.N. Gillespie J.R. Fink A.L. Proteins Struct. Funct. Genet. 2000; 41: 415-427Crossref PubMed Scopus (1773) Google Scholar, 13Romero P. Obradovic Z. Li X. Garner E.C. Brown C.J. Dunker A.K. Proteins Struct. Funct. Genet. 2001; 42: 38-48Crossref PubMed Scopus (1367) Google Scholar, 14Vucetic S. Brown C.J. Dunker A.K. Obradovic Z. Proteins Struct. Funct. Genet. 2003; 52: 573-584Crossref PubMed Scopus (324) Google Scholar, 15Jones D.T. Ward J.J. Proteins Struct. Funct. Genet. 2003; 53: 573-578Crossref PubMed Scopus (178) Google Scholar). Compositional bias is a common feature of IDPs (9Dunker A.K. Lawson J.D. Brown C.J. Wlliams R.M. Romero P. Oh J.S. Oldfield C.J. Campen A.M. Ratliff C.M. Hipps K.W. Ausio J. Nissen M.S. Reeves R. Kang C. Kissinger C.R. Bailey R.W. Griswold M.D. Chiu W. Garner E.C. Obradovic Z. J. Mol. Graph. Model. 2001; 19: 26-59Crossref PubMed Scopus (1853) Google Scholar, 14Vucetic S. Brown C.J. Dunker A.K. Obradovic Z. Proteins Struct. Funct. Genet. 2003; 52: 573-584Crossref PubMed Scopus (324) Google Scholar). Some amino acids are substantially more abundant in IDPs than in the average folded protein, whereas others are rare or absent. The bias generally favors hydrophilic amino acids and discriminates against hydrophobic residues. Thus, IDPs are generally rich in Arg, Gln, Glu, Lys, Pro, and Ser. They are deficient in Cys, Ile, Leu, Phe, Trp, Tyr, and Val. The other amino acids are present at levels comparable with those in the average folded protein (Met, Thr) or are enriched in some IDPs and depleted in others (Ala, Asn, Asp, Gly, His). As will be discussed below, there is now compelling evidence indicating that the relationship between amino acid composition and IDPs is far more complex than the simple trends described above. The compositional bias of IDPs accounts for their inability to fold; the paucity of hydrophobic groups precludes the formation of a hydrophobic core about which the chain can fold. Further, many IDPs have a large excess of basic or acidic amino acids and hence are highly charged at neutral pH. The charge on such proteins acts to destabilize a compact structure. Interaction with target proteins or nucleic acids overcomes these problems, allowing the IDP to undergo a concerted folding-binding process. Hydrophobic groups of the IDP are buried in the IDP-target interface, interacting with exposed hydrophobic groups of the target. The target usually has a charge opposite to that of the IDP, at least locally, leading to a lower charge density in the complex. What advantages of IDPs account for their abundance, especially in eukaryotes? 1) The coupling of folding and binding provides enhanced specificity at the expense of binding affinity (16Spolar R.S. Record Jr., M.T. Science. 1994; 263: 777-784Crossref PubMed Scopus (1373) Google Scholar, 17Kriwacki R.W. Hengst L. Tennant L. Reed S.I. Wright P.E. Proc. Natl. Acad. Sci. U. S. A. 1996; 93: 11504-11509Crossref PubMed Scopus (490) Google Scholar). The negative ΔS of folding is paid for by a large negative ΔH of binding. 2) The binding energy for an IDP is more favorable than for a compact protein with the same number of residues because the area of the interface is substantially larger for the IDP (18Gunasekaran K. Tsai C.J. Kumar S. Zanuy D. Nussinov R. Trends Biochem. Sci. 2003; 28: 81-85Abstract Full Text Full Text PDF PubMed Scopus (288) Google Scholar). 3) The flexibility of an IDP allows it to bind a number of different targets. 4) Flexibility also permits more rapid binding to the target through the mechanism of “flycasting” (19Shoemaker B.A. Portman J.J. Wolynes P.G. Proc. Natl. Acad. U. S. A. 2000; 97: 8868-8873Crossref PubMed Scopus (850) Google Scholar). An IDP is extended and thus presents multiple points of attachment for the bimolecular step of encounter complex formation. The encounter complex can then undergo rapid unimolecular steps to the stable complex. 5) Flexibility permits ready access of a side chain to modifying enzymes and to targets that recognize the modification. 6) A flexible, extended structure can be rapidly degraded by intracellular proteases, providing a facile pathway for down-regulation. Susceptibility to proteases is also a potential disadvantage for IDPs. Survival in the cell implies that IDPs must be complexed to targets most of the time. The coupled process by which an IDP folds and binds to its target bears some resemblance to the induced fit concept of enzyme-substrate binding and allostery. However, in induced fit, ligand binding perturbs an equilibrium between two compact, well defined protein conformations, whereas binding of an IDP to a target involves a disorder → order transition of the IDP concomitant with formation of a macromolecular complex. The target may have a compact, well defined conformation, or it may be an IDP itself. In the final state, the IDP-target complex has a compact structure with classical protein folds (at least as a core), possibly with one or more flexible appendages. Most of the functions of IDPs are related to molecular recognition of DNA, RNA, and other proteins. Fully or partially disordered proteins are especially common in processes such as transcription, cell cycle regulation, signal transduction, and chaperoning the folding of proteins and RNA (20Tompa P. FEBS Lett. 2005; 579: 3346-3354Crossref PubMed Scopus (595) Google Scholar, 21Dyson H.J. Wright P.E. Nat. Rev. Mol. Cell Biol. 2005; 6: 197-208Crossref PubMed Scopus (3038) Google Scholar, 22Fink A.L. Curr. Opin. Struct. Biol. 2005; 15: 35-41Crossref PubMed Scopus (606) Google Scholar). Partially disordered regions are commonly found at the amino and carboxyl ends of proteins but can be present at internal sites as well. IDPs have been grouped into two main categories based on function: mediators of macromolecular interactions and entropic connectors/springs (20Tompa P. FEBS Lett. 2005; 579: 3346-3354Crossref PubMed Scopus (595) Google Scholar). Because of their ubiquity, other functions are likely to be identified as well. Linker histones comprise a family of nucleosome-binding proteins that stabilize condensed chromatin and regulate genome function (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar, 23Bustin M. Catez F. Lim J.H. Mol. Cell. 2005; 17: 617-620Abstract Full Text Full Text PDF PubMed Scopus (188) Google Scholar). The linker histones of most eukaryotes have a very simple domain organization, consisting of a central winged helix fold, a short N-terminal extension, and a long basic C-terminal domain (Fig. 1). Little is known about the NTD region. The winged helix domain interacts with nucleosomes (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar). The CTD is ∼100 residues in length, enriched in Lys, Ala, and Pro, and unstructured in aqueous solution (24van Holde K.E. Chromatin. Springer-Verlag, New York1988Google Scholar). The determinants required to stabilize chromatin fibers in highly condensed conformations lie in the CTD (25Allan J. Mitchell T. Harborne N. Bohm L. Crane-Robinson C. J. Mol. Biol. 1986; 187: 591-601Crossref PubMed Scopus (263) Google Scholar, 26Lu X. Hansen J.C. J. Biol. Chem. 2004; 279: 8701-8707Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar). There are six somatic linker histone isoforms in most higher eukaryotes. Although the primary sequence of the isoform CTDs has diverged (24van Holde K.E. Chromatin. Springer-Verlag, New York1988Google Scholar), the amino acid composition of the CTDs is surprisingly similar (Table 1). Each of the CTDs consists of ∼40% Lys, ∼20–35% Ala, and ∼15% Pro. Ser, Thr, Gly, and Val are present in all isoform CTDs in smaller, variable amounts. His, Tyr, Trp, Met, and Cys are never found in any of the isoform CTDs, and the other seven amino acids are sporadically present once or twice in a particular CTD. Val is the only hydrophobic amino acid found in all CTDs. The characteristic amino acid composition of the linker histone CTDs suggests that this domain functions as an IDP region. Recent experimental evidence supports this idea and has focused attention on the relationship between intrinsic disorder and amino acid composition.TABLE 1Amino acid composition of linker histone CTDs, core histone NTDs, macroH2A connector region, and yeast prion domainsIDP regionResiduesaThe start methionines have been excluded.Lys DbD, disorder-producing; N, neutral; O, order-producing. See Ref. 43.Pro DGly DArg DAsn DGln DSer DGlu DAsp DMet DAla NThr NVal OHis OPhe OIle OLeu OCys OTrp OTyr OMouse H1° CTD9741.2cPercentage of total amino acids.12.41.02.1007.22.11.0017.55.29.301.000000Human H1° CTD9742.312.41.01.0007.22.11.0017.56.26.201.01.01.0000Human H1-1 CTD9840.813.35.10003.100024.55.18.20000000Human H1-2 CTD10539.014.35.71013.800023.85.74.8001.00000Human H1-3 CTD10841.712.03.70001.900034.32.83.70000000Human H1-4 CTD10441.313.53.80002.900033.73.81.00000000Human H1-a CTD9738.111.33.12.11.007.200019.610.36.20001.0000Human macroH2A connector3834.213.27.92.602.610.52.60013.22.62.6005.32.6000Human H2B NTD2733.314.87.4003.77.43.73.7018.53.73.70000000Xenopus H2B NTD2733.314.87.4003.77.43.73.7014.87.43.70000000Human H3 NTD3724.35.410.88.105.42.700018.913.55.40002.72.700Xenopus H3 NTD3721.65.410.810.805.45.400021.613.52.70002.7000Xenopus H4 NTD2718.5029.614.83.73.73.703.703.703.73.703.77.4000Xenopus H2A NTD1323.1030.815.407.77.70007.77.700000000Yeast Ure2p PDdpD, prion domain.881.105.74.537.511.411.43.42.31.11.15.74.51.12.33.43.4000Yeast Sup35p PD1130.94.416.81.817.728.33.501.804.40001.800.90017.7a The start methionines have been excluded.b D, disorder-producing; N, neutral; O, order-producing. See Ref. 43Weathers E.A. Paulaitis M.E. Woolf T.B. Hoh J.H. FEBS Lett. 2004; 576: 348-352Crossref PubMed Scopus (109) Google Scholar.c Percentage of total amino acids.d pD, prion domain. Open table in a new tab CTD truncation mutants were used to define the location of the amino acid residues involved in mouse H1° CTD function during chromatin condensation (26Lu X. Hansen J.C. J. Biol. Chem. 2004; 279: 8701-8707Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar). The determinants for both linker DNA binding and chromatin fiber stabilization were localized to two distinct, separated regions of the CTD (Fig. 1). The functional regions are somewhat enriched in Val, but otherwise the amino acid composition of all CTD regions examined was similar. The two functional CTD regions can be interchanged, 3X. Lu and J. C. Hansen, unpublished data. even though their primary sequences are different. This suggests that the key properties involved in DNA binding and chromatin condensation are amino acid composition and location of the CTD region relative to the winged helix domain, not primary sequence. The H1 CTD also has been shown to mediate the protein-protein interactions involved in H1-dependent activation of the apoptotic nuclease, DFF40/CAD (7Widlak P. Kalinowska M. Parseghian M.H. Lu X. Hansen J.C. Garrard W.T. Biochemistry. 2005; 44: 7871-7878Crossref PubMed Scopus (54) Google Scholar). The CTD region that binds to the enzyme is large and partially overlaps with the two CTD regions that bind linker DNA and stabilize condensed chromatin (Fig. 1). Interestingly, all somatic linker histone isoforms activated the enzyme identically in vitro. Moreover, all free CTD peptides that were at least 47 residues in length could bind to and activate the enzyme, regardless of their primary sequence and original location in the intact CTD. Thus, amino acid composition and location of the CTD region relative to the winged helix domain also appear to be the determinants of CTD-protein interactions. Together, the studies of linker histone CTD involvement in chromatin condensation and DFF40/CAD activation demonstrate that the H1 CTD is an IDP region capable of interacting with both DNA and proteins and suggest that CTD function is linked to a distinctive amino acid composition. The functions of the core histone NTDs have been investigated extensively (2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar, 3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar, 27Hansen J.C. Tse C. Wolffe A.P. Biochemistry. 1998; 37: 17637-17641Crossref PubMed Scopus (216) Google Scholar). These domains currently are of particular interest because specific patterns of NTD acetylation and methylation regulate gene expression and other nuclear processes (28Fischle W. Wang Y. Allis C.D. Curr. Opin. Cell Biol. 2003; 15: 172-183Crossref PubMed Scopus (986) Google Scholar, 29Kurdistani S.K. Grunstein M. Nat. Rev. Mol. Cell Biol. 2003; 4: 276-284Crossref PubMed Scopus (552) Google Scholar, 30Henikoff S. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5308-5309Crossref PubMed Scopus (76) Google Scholar). The NTDs are not observed in the crystal structures of the nucleosome (31Luger K. Curr. Opin. Genet. Dev. 2003; 13: 127-135Crossref PubMed Scopus (232) Google Scholar). Free NTD peptides are disordered (see Ref. 27Hansen J.C. Tse C. Wolffe A.P. Biochemistry. 1998; 37: 17637-17641Crossref PubMed Scopus (216) Google Scholar). In nucleosomes, the NTDs adopt increased α-helical content when bound to DNA (32Baneres J.L. Martin A. Parelló J. J. Mol. Biol. 1997; 273: 503-508Crossref PubMed Scopus (76) Google Scholar, 33Wang X. Moore S.C. Laszckzak M. Ausió J. J. Biol. Chem. 2000; 275: 35013-35020Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar). All four of the core histone NTDs participate in the internucleosomal interactions that drive chromatin fiber condensation (3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar). In addition, the H3 and H4 NTDs also bind to proteins such as Sir3p and p300 (5Hecht A. Laroche T. Strahl-Bolsinger S. Gasser S.M. Grunstein M. Cell. 1995; 80: 583-592Abstract Full Text PDF PubMed Scopus (694) Google Scholar, 6An W. Roeder R.G. J. Biol. Chem. 2003; 278: 1504-1510Abstract Full Text Full Text PDF PubMed Scopus (23) Google Scholar). The amino acid composition of the core histone NTDs is shown in Table 1. The NTDs have a low percentage of hydrophobic residues and are highly enriched in Lys, Gly, and Arg residues. By all available criteria, the core histone NTDs also possess the characteristics of an IDP region. Unlike the linker histone CTDs, their primary sequences are highly conserved. A closer examination of the amino acid composition of the core histone NTDs reveals several interesting trends (Table 1). The composition of the H2A and H4 NTDs is very similar but differs significantly from that of the linker histone CTDs. Specifically, the H2A and H4 NTDs have no Pro, more Gly and Arg, and fewer Ala than the linker histone CTDs. On the other hand, the amino acid composition of the H2B and H3 NTDs is surprisingly similar to that of the linker histone CTDs (Table 1). Based on amino acid composition, at least two different types of IDP regions are involved in histone function. It is of note that the characteristic amino acid composition of the H1 CTDs also is found in other proteins. A region of 38 residues in the core histone variant, macroH2A, has a very similar amino acid composition as the linker histone CTDs (Table 1). However, in this case, the IDP region is located internally and connects two structured domains (Fig. 1). The amino acid composition of the macroH2A connector domain suggests that it is an IDP region. The broader implication is that the linker histone CTD actually is a specific type of IDP region that is found in different locations within different proteins. Further support for the existence of specific types of IDP regions comes from studies of yeast prions (infectious proteins). The yeast prion proteins Ure2p and Sup35p each contain an N-terminal prion domain that is sufficient for prion formation but dispensable for the normal function of the protein (34Wickner R.B. Edskes H.K. Roberts B.T. Baxa U. Pierce M.M. Ross E.D. Brachmann A. Genes Dev. 2004; 18: 470-485Crossref PubMed Scopus (64) Google Scholar). In both cases the prion domains are intrinsically disordered, but upon conversion to the prion conformation, they self-associate to form self-propagating amyloid-like fibrils (35Chien P. Weissman J.S. DePace A.H. Annu. Rev. Biochem. 2004; 73: 617-656Crossref PubMed Scopus (288) Google Scholar). The prion conformation is a folded β-domain structure, as with other amyloid-forming proteins. Randomizing the primary sequence of the Sup35p and Ure2p prion domains while keeping the amino acid composition constant does not inhibit prion formation (36Ross E.D. Edskes H.K. Terry M.J. Wickner R.B. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 12825-12830Crossref PubMed Scopus (173) Google Scholar, 37Ross E.D. Baxa U. Wickner R.B. Mol. Cell. Biol. 2004; 24: 7206-7213Crossref PubMed Scopus (148) Google Scholar), indicating that amino acid composition is the fundamental determinant of amyloid formation in these systems and not the primary sequence. The amino acid composition of the prion domains is consistent with that of an IDP but differs significantly from that of the H1 CTDs (Table 1). In particular, Ure2p and Sup35p are highly enriched in Asn and Gln rather than Lys, Ala, and Pro. An intriguing possibility is that amino acid composition determines which type of secondary structure is formed when an IDP region binds to a target, e.g. Asn/Gln-rich IDP regions may form β-domains, whereas Lys/Ala/Pro IDP regions may form α-helices. (Although Pro is generally considered to be a strong helix breaker, it is also a strong helix initiator, frequently occurring as the N-terminal residue of α-helical segments (38Richardson J.S. Richardson D.C. Science. 1988; 240: 1648-1652Crossref PubMed Scopus (1298) Google Scholar).) In summary, the relationship between amino acid composition and IDP function is complex. The same is true for the relationship between amino acid composition and primary sequence. Using amino acid composition as a criterion, it appears that there are many different types of functional IDP regions just as there are many types of different functional protein folds. Lysine residues within the intrinsically disordered core histone NTDs are modified through addition of methyl or acetyl groups. Specific patterns of NTD acetylation and methylation are involved in the regulation of transcription, replication, and other nuclear processes (28Fischle W. Wang Y. Allis C.D. Curr. Opin. Cell Biol. 2003; 15: 172-183Crossref PubMed Scopus (986) Google Scholar, 29Kurdistani S.K. Grunstein M. Nat. Rev. Mol. Cell Biol. 2003; 4: 276-284Crossref PubMed Scopus (552) Google Scholar, 30Henikoff S. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5308-5309Crossref PubMed Scopus (76) Google Scholar). These patterns of modifications often function by establishing or disrupting specific binding surfaces for other proteins. For example, the proteins HP1 (39Jacobs S.A. Khorisanizadeh S. Science. 2002; 295: 2080-2083Crossref PubMed Scopus (649) Google Scholar, 40Nielsen P.R. Nietlispach D. Mott H.R. Callaghan J. Bannister A. Kouzarides T. Murzin A.G. Murzina N.V. Laue E.D. Nature. 2002; 416: 103-107Crossref PubMed Scopus (496) Google Scholar) and polycomb (41Fischle W. Wang Y. Jacobs S.A. Kim Y. Allis C.D. Khorisanizadeh S. Genes Dev. 2003; 17: 1870-1881Crossref PubMed Scopus (793) Google Scholar) can only bind to an H3 NTD peptide if the peptide is di- or trimethylated on Lys-9. A question that has remained largely unanswered at the molecular level is how acetylation and methylation influence the ability of the core histone NTDs to participate in specific protein-protein interactions. Acetylation and methylation both affect the charge density, size, and hydrophobicity of the Lys side chain. Hydrophobicity may be particularly important because there are very few hydrophobic amino acids in IDP regions. Acetylation of Lys makes formation of secondary structures more favorable by decreasing the positive charge density and enhancing hydrophobic character. The free charged NH group is converted into a neutral amide linkage capped with a hydrophobic methyl group. Methylation of Lys leaves the positive charge density unaltered but replaces up to three polar NH groups capable of hydrogen bonding with hydrophobic methyl groups. Acetylation and methylation of Lys ultimately create unique amino acids with unusual properties. Hence we do not believe that acetylation and methylation simply create patterns of “marks” that are recognized by other proteins. Rather, we feel that acetylation and methylation alter the fundamental IDP properties of the NTDs as a prerequisite for coupled NTD folding and target binding. This view is supported by the finding that nonspecific hyperacetylation of the core histone NTDs increases their average α-helical content (33Wang X. Moore S.C. Laszckzak M. Ausió J. J. Biol. Chem. 2000; 275: 35013-35020Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar). Moreover, Dion et al. (42Dion M.F. Altschuler S.J. Wu L.F. Rando O.J. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5501-5506Crossref PubMed Scopus (310) Google Scholar) have shown that H4 acetylation at Lys-5, -8, and -12 functions in vivo through a nonspecific, cumulative mechanism. X-ray and NMR studies of complexes between methylated H3 peptides and the HP-1 and polycomb chromodomains have provided insight at the molecular level (39Jacobs S.A. Khorisanizadeh S. Science. 2002; 295: 2080-2083Crossref PubMed Scopus (649) Google Scholar, 40Nielsen P.R. Nietlispach D. Mott H.R. Callaghan J. Bannister A. Kouzarides T. Murzin A.G. Murzina N.V. Laue E.D. Nature. 2002; 416: 103-107Crossref PubMed Scopus (496) Google Scholar, 41Fischle W. Wang Y. Jacobs S.A. Kim Y. Allis C.D. Khorisanizadeh S. Genes Dev. 2003; 17: 1870-1881Crossref PubMed Scopus (793) Google Scholar). In all such complexes, the disordered methylated H3 peptide adopts an extended chain structure, actually serving to fill in a β-sheet in several cases. The extended chain structure optimizes interactions of both the backbone and side chain groups of the peptide with those of the protein. Importantly, modified Lys residues are recognized by specific features. For example, chromodomains have three aromatic side chains that form a hydrophobic cage that interacts with the methyl group(s) of the methylated Lys side chain. Without the hydrophobicity imparted by the methyl groups, binding would not be possible and the NTD peptide would not assume the extended chain secondary structure. The biological need for the core histone NTDs and linker histone CTD to interact with many modifying enzymes and recognition modules with widely varying structures can be readily accommodated if these domains are intrinsically disordered. We envision that the histone terminal domains interact with their targets through several different modes. In many cases, they bind as extended chains to sites that recognize the local sequence properties as in the case of the recognition motifs discussed in the previous section. In other cases, these IDP regions can fold into α-helical segments, β-hairpins, or other simple motifs, burying hydrophobic groups introduced by modifications and/or in combination with hydrophobic groups on the target. If binding depends primarily on the stability of the secondary structure formed, it may be more important to conserve amino acid composition rather than primary sequence. In this regard, the sequence conservation of the core histone NTDs may be related to maintaining unique post-translational modification sites more so than the primary sequence per se. The IDP regions of linker histones and yeast amyloid proteins challenge the paradigm that the primary amino acid sequence and corresponding main and side chain interactions dictate formation of a unique local polypeptide conformation with the lowest free energy state. In these systems, a specific amino acid composition is conserved and correlated with protein function, whereas the primary sequence varies. A recent study of 718 IDP sequences using support vector machine analysis (43Weathers E.A. Paulaitis M.E. Woolf T.B. Hoh J.H. FEBS Lett. 2004; 576: 348-352Crossref PubMed Scopus (109) Google Scholar) concluded that amino acid composition is the only parameter needed to accurately recognize IDPs and that IDP regions are defined by physical properties of a short stretch of amino acids rather than the interactions dictated by the primary sequence of amino acids. Evidence is mounting that in many cases there is a direct correlation between amino acid composition, intrinsic disorder, and protein function. Even in situations where the primary sequence is conserved (such as the core histone NTDs), local amino acid composition may be the key property required for molecular recognition. Examination of the relationships between amino acid composition and IDP function in a wide range of biological systems is likely to reveal new principles of protein structure and molecular recognition.
DOI: 10.1073/pnas.0506136102
2005
Cited 198 times
Primary sequence independence for prion formation
Many proteins can adopt self-propagating β-sheet-rich structures, termed amyloid fibrils. The [URE3] and [PSI + ] prions of Saccharomyces cerevisiae are infectious amyloid forms of the proteins Ure2p and Sup35p, respectively. Ure2p forms prions primarily as a result of its sequence composition, as versions of Ure2p with the prion domain amino acids shuffled are still able to form prions. Here we show that prion induction by both Ure2p and Ure2-21p, one of the scrambled versions of Ure2p, is clearly dependent on the length of the inducing fragment. For Ure2-21p, no single sequence is found in all of the inducing fragments, highlighting the sequence independence of prion formation. Furthermore, the sequence of the Sup35p prion domain can also be randomized without blocking prion formation. Indeed, a single shuffled sequence could give rise to several prion variants. These results suggest that [PSI + ] formation is driven primarily by the amino acid composition of the Sup35p prion domain, and that the Sup35p oligopeptide repeats are not required for prion maintenance.
DOI: 10.1128/mcb.24.16.7206-7213.2004
2004
Cited 176 times
Scrambled Prion Domains Form Prions and Amyloid
The [URE3] prion of Saccharomyces cerevisiae is a self-propagating amyloid form of Ure2p. The amino-terminal prion domain of Ure2p is necessary and sufficient for prion formation and has a high glutamine (Q) and asparagine (N) content. Such Q/N-rich domains are found in two other yeast prion proteins, Sup35p and Rnq1p, although none of the many other yeast Q/N-rich domain proteins have yet been found to be prions. To examine the role of amino acid sequence composition in prion formation, we used Ure2p as a model system and generated five Ure2p variants in which the order of the amino acids in the prion domain was randomly shuffled while keeping the amino acid composition and C-terminal domain unchanged. Surprisingly, all five formed prions in vivo, with a range of frequencies and stabilities, and the prion domains of all five readily formed amyloid fibers in vitro. Although it is unclear whether other amyloid-forming proteins would be equally resistant to scrambling, this result demonstrates that [URE3] formation is driven primarily by amino acid composition, largely independent of primary sequence.
DOI: 10.1128/mcb.01140-09
2010
Cited 146 times
Compositional Determinants of Prion Formation in Yeast
Numerous prions (infectious proteins) have been identified in yeast that result from the conversion of soluble proteins into beta-sheet-rich amyloid-like protein aggregates. Yeast prion formation is driven primarily by amino acid composition. However, yeast prion domains are generally lacking in the bulky hydrophobic residues most strongly associated with amyloid formation and are instead enriched in glutamines and asparagines. Glutamine/asparagine-rich domains are thought to be involved in both disease-related and beneficial amyloid formation. These domains are overrepresented in eukaryotic genomes, but predictive methods have not yet been developed to efficiently distinguish between prion and nonprion glutamine/asparagine-rich domains. We have developed a novel in vivo assay to quantitatively assess how composition affects prion formation. Using our results, we have defined the compositional features that promote prion formation, allowing us to accurately distinguish between glutamine/asparagine-rich domains that can form prion-like aggregates and those that cannot. Additionally, our results explain why traditional amyloid prediction algorithms fail to accurately predict amyloid formation by the glutamine/asparagine-rich yeast prion domains.
DOI: 10.1073/pnas.1119366109
2012
Cited 132 times
De novo design of synthetic prion domains
Prions are important disease agents and epigenetic regulatory elements. Prion formation involves the structural conversion of proteins from a soluble form into an insoluble amyloid form. In many cases, this structural conversion is driven by a glutamine/asparagine (Q/N)-rich prion-forming domain. However, our understanding of the sequence requirements for prion formation and propagation by Q/N-rich domains has been insufficient for accurate prion propensity prediction or prion domain design. By focusing exclusively on amino acid composition, we have developed a prion aggregation prediction algorithm (PAPA), specifically designed to predict prion propensity of Q/N-rich proteins. Here, we show not only that this algorithm is far more effective than traditional amyloid prediction algorithms at predicting prion propensity of Q/N-rich proteins, but remarkably, also that PAPA is capable of rationally designing protein domains that function as prions in vivo.
DOI: 10.1016/j.neuron.2018.04.032
2018
Cited 100 times
RNP-Granule Assembly via Ataxin-2 Disordered Domains Is Required for Long-Term Memory and Neurodegeneration
Human Ataxin-2 is implicated in the cause and progression of amyotrophic lateral sclerosis (ALS) and type 2 spinocerebellar ataxia (SCA-2). In Drosophila, a conserved atx2 gene is essential for animal survival as well as for normal RNP-granule assembly, translational control, and long-term habituation. Like its human homolog, Drosophila Ataxin-2 (Atx2) contains polyQ repeats and additional intrinsically disordered regions (IDRs). We demonstrate that Atx2 IDRs, which are capable of mediating liquid-liquid phase transitions in vitro, are essential for efficient formation of neuronal mRNP assemblies in vivo. Remarkably, ΔIDR mutants that lack neuronal RNP granules show normal animal development, survival, and fertility. However, they show defects in long-term memory formation/consolidation as well as in C9ORF72 dipeptide repeat or FUS-induced neurodegeneration. Together, our findings demonstrate (1) that higher-order mRNP assemblies contribute to long-term neuronal plasticity and memory, and (2) that a targeted reduction in RNP-granule formation efficiency can alleviate specific forms of neurodegeneration.
DOI: 10.1038/ncb1105-1039
2005
Cited 122 times
Prion domains: sequences, structures and interactions
DOI: 10.1021/bi800726m
2008
Cited 79 times
Fitting Yeast and Mammalian Prion Aggregation Kinetic Data with the Finke−Watzky Two-Step Model of Nucleation and Autocatalytic Growth
Recently, we reported 14 amyloid protein aggregation kinetic data sets that were fit using the "Ockham's razor"/minimalistic Finke-Watzky (F-W) two-step model of slow nucleation (A --> B, rate constant k 1) and fast autocatalytic growth (A + B --> 2B, rate constant k 2), yielding quantitative (average) rate constants for nucleation ( k 1) and growth ( k 2), where A is the monomeric protein and B is the polymeric protein [Morris, A. M., et al. (2008) Biochemistry 47, 2413-2427]. Herein, we apply the F-W model to 27 representative prion aggregation kinetic data sets obtained from the literature. Each prion data set was successfully fit with the F-W model, including three different yeast prion proteins (Sup35p, Ure2p, and Rnq1p) as well as mouse and human prions. These fits yield the first quantitative rate constants for the steps of nucleation and growth in prion aggregation. Examination of a Sup35p system shows that the same rate constants are obtained for nucleation and for growth within experimental error, regardless of which of six physical methods was used, a unique set of important control experiments in the protein aggregation literature. Also provided herein are analyses of several factors influencing the aggregation of prions such as glutamine/asparagine rich regions and the number of oligopeptide repeats in the prion domain. Where possible, verification or refutation of previous correlations to glutamine/asparagine regions, or the number of repeat sequences, in literature aggregation kinetics is given in light of the quantitative rate constants obtained herein for nucleation and growth during prion aggregation. The F-W model is then contrasted to four literature mechanisms that address the molecular picture of prion transmission and propagation. Key limitations of the F-W model are listed to prevent overinterpretation of the data being analyzed, limitations that derive ultimately from the model's simplicity. Finally, possible avenues of future research are suggested.
DOI: 10.1007/s00018-013-1543-6
2014
Cited 45 times
Yeast prions and human prion-like proteins: sequence features and prediction methods
Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis. This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms.
DOI: 10.1038/s41467-019-11550-w
2019
Cited 37 times
The prion-like protein kinase Sky1 is required for efficient stress granule disassembly
Stress granules are membraneless protein- and mRNA-rich organelles that form in response to perturbations in environmental conditions. Stress granule formation is reversible, and persistent stress granules have been implicated in a variety of neurodegenerative disorders, including amyotrophic lateral sclerosis. However, characterization of the factors involved in dissolving stress granules is incomplete. Many stress granule proteins contain prion-like domains (PrLDs), some of which have been linked to stress granule formation. Here, we demonstrate that the PrLD-containing yeast protein kinase Sky1 is a stress granule component. Sky1 is recruited to stress granules in part via its PrLD, and Sky1's kinase activity regulates timely stress granule disassembly during stress recovery. This effect is mediated by phosphorylation of the stress granule component Npl3. Sky1 can compensate for defects in chaperone-mediated stress granule disassembly and vice-versa, demonstrating that cells have multiple overlapping mechanisms for re-solubilizing stress granule components.
DOI: 10.1073/pnas.1912723117
2020
Cited 33 times
Composition-based prediction and rational manipulation of prion-like domain recruitment to stress granules
Significance Many RNA-binding proteins contain aggregation-prone prion-like domains (PrLDs), and mutations in several of these have been linked to degenerative diseases. Additionally, many of these proteins are associated with stress granules, which are membraneless organelles that form under stress, in part through reversible protein assembly. Although the mechanisms of stress granule assembly are unclear, PrLDs can play a role in this process. In order to further understand how PrLDs respond to stress, we analyzed the assembly propensities of a selection of different PrLDs from yeast. We find that many PrLDs are efficiently recruited to stress-induced assemblies and that this recruitment is predominantly dependent on the amino acid composition of the PrLDs.
DOI: 10.3390/ijms22031251
2021
Cited 25 times
From Prions to Stress Granules: Defining the Compositional Features of Prion-Like Domains That Promote Different Types of Assemblies
Stress granules are ribonucleoprotein assemblies that form in response to cellular stress. Many of the RNA-binding proteins found in stress granule proteomes contain prion-like domains (PrLDs), which are low-complexity sequences that compositionally resemble yeast prion domains. Mutations in some of these PrLDs have been implicated in neurodegenerative diseases, including amyotrophic lateral sclerosis and frontotemporal dementia, and are associated with persistent stress granule accumulation. While both stress granules and prions are macromolecular assemblies, they differ in both their physical properties and complexity. Prion aggregates are highly stable homopolymeric solids, while stress granules are complex dynamic biomolecular condensates driven by multivalent homotypic and heterotypic interactions. Here, we use stress granules and yeast prions as a paradigm to examine how distinct sequence and compositional features of PrLDs contribute to different types of PrLD-containing assemblies.
DOI: 10.1101/gad.1177104
2004
Cited 80 times
Prions: proteins as genes and infectious entities
Infectious proteins (prions) include the transmissible spongiform encephalopathies (TSEs) of mammals, the amyloidoses [URE3], [PSI], and [PIN] of Saccharomyces cerevisiae and [Het-s] of the filamentous fungus Podospora anserina, and the self-activating in trans vacuolar protease B of yeast, called [ ]. [Het-s] and [ ] carry out cellular functions, namely, heterokaryon incompatibility and protein degradation, whereas the TSEs, [URE3], and apparently [PSI] are diseases. [PIN] appears to be neutral. We review the means of discovering prions, the interactions of these “autonomous” entities with their hosts (particularly chaperones), and the relation of prions to other nonnucleic acid heritable traits. That most amyloidoses are not infectious poses a conundrum. We use the term “prion” (Prusiner 1982) to mean “infectious protein” in any organism, whatever the mechanism, and to imply the absence of a nucleic acid necessary for the infectivity. Although the word “prion” began its life as simply another name for the causative agent of the mammalian transmissible spongiform encephalopathies (TSEs), its usage changed with the discovery of the yeast infectious proteins [URE3] and [PSI] (Wickner 1994). These and the more recently discovered prions, [Het-s] of Podospora anserina (Coustou et al. 1997) and [PIN] of yeast (Derkatch et al. 2001), are all based on self-propagating amyloid forms of chromosomally encoded proteins. The latest prion, [ ] of Saccharomyces, is simply the active form of the vacuolar protease B, an enzyme that in certain conditions can be required in trans for activation of its own precursor protein (Roberts and Wickner 2003). In contrast to other prions, [ ] is not a self-propagating amyloid. Other reviews also deal with a selection of these subjects (Wickner 1997; Kushnirov and Ter-Avanesyan 1998; Caughey 2000; Uptain and Lindquist 2002; Chesebro 2003; Tuite and Cox 2003; Wickner et al. 2004). Transmissible spongiform encephalopathies
DOI: 10.1146/annurev.genet.38.072902.092200
2004
Cited 75 times
Prion Genetics: New Rules for a New Kind of Gene
Just as nucleic acids can carry out enzymatic reactions, proteins can be genes. These heritable infectious proteins (prions) follow unique genetic rules that enable their identification: reversible curing, inducible "spontaneous generation," and phenotype surprises. Most prions are based on self-propagating amyloids, depend heavily on chaperones, show strain phenomena and, like other infectious elements, show species barriers to transmission. A recently identified prion is based on obligatory self-activation of an enzyme in trans. Although prions can be detrimental, they may also be beneficial to their hosts.
DOI: 10.1021/bi7024589
2008
Cited 60 times
Amyloids of Shuffled Prion Domains That Form Prions Have a Parallel In-Register β-Sheet Structure
The [URE3] and [PSI+] prions of Saccharomyces cerevisiae are self-propagating amyloid forms of Ure2p and Sup35p, respectively. The Q/N-rich N-terminal domains of each protein are necessary and sufficient for the prion properties of these proteins, forming in each case their amyloid cores. Surprisingly, shuffling either prion domain, leaving amino acid content unchanged, does not abrogate the ability of the proteins to become prions. The discovery that the amino acid composition of a polypeptide, not the specific sequence order, determines prion capability seems contrary to the standard folding paradigm that amino acid sequence determines protein fold. The shuffleability of a prion domain further suggests that the β-sheet structure is of the parallel in-register type, and indeed, the normal Ure2 and Sup35 prion domains have such a structure. We demonstrate that two shuffled Ure2 prion domains capable of being prions form parallel in-register β-sheet structures, and our data indicate the same conclusion for a single shuffled Sup35 prion domain. This result confirms our inference that shuffleability indicates parallel in-register structure.
DOI: 10.1371/journal.pcbi.1005465
2017
Cited 33 times
Amino acid composition predicts prion activity
Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136.
DOI: 10.1128/mcb.00652-16
2017
Cited 32 times
Effects of Mutations on the Aggregation Propensity of the Human Prion-Like Protein hnRNPA2B1
Hundreds of human proteins contain prion-like domains, which are a subset of low-complexity domains with high amino acid compositional similarity to yeast prion domains. A recently characterized mutation in the prion-like domain of the human heterogeneous nuclear ribonucleoprotein hnRNPA2B1 increases the aggregation propensity of the protein and causes multisystem proteinopathy. The mutant protein forms cytoplasmic inclusions when expressed in Drosophila, the mutation accelerates aggregation in vitro, and the mutant prion-like domain can substitute for a portion of a yeast prion domain in supporting prion activity. To examine the relationship between amino acid sequence and aggregation propensity, we made a diverse set of point mutations in the hnRNPA2B1 prion-like domain. We found that the effects on prion formation in Saccharomyces cerevisiae and aggregation in vitro could be predicted entirely based on amino acid composition. However, composition was an imperfect predictor of inclusion formation in Drosophila; while most mutations showed similar behaviors in yeast, in vitro, and in Drosophila, a few showed anomalous behavior. Collectively, these results demonstrate the significant progress that has been made in predicting the effects of mutations on intrinsic aggregation propensity while also highlighting the challenges of predicting the effects of mutations in more complex organisms.
DOI: 10.1371/journal.pcbi.1007487
2020
Cited 29 times
Atypical structural tendencies among low-complexity domains in the Protein Data Bank proteome
A variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the Protein Data Bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure tendencies across the entire PDB proteome. Secondary structure tendencies varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure tendencies. Comparison of LCD secondary structure tendencies with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure tendencies as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural tendencies among LCDs parsed by the nature and magnitude of single amino acid enrichment.
DOI: 10.1007/978-1-62703-438-8_16
2013
Cited 35 times
A Bioinformatics Method for Identifying Q/N-Rich Prion-Like Domains in Proteins
Numerous proteins contain domains that are enriched in glutamine and asparagine residues, and aggregation of some of these proteins has been linked to both prion formation in yeast and a number of human diseases. Unfortunately, predicting whether a given glutamine/asparagine-rich protein will aggregate has proven difficult. Here we describe a recently developed algorithm designed to predict the aggregation propensity of glutamine/asparagine-rich proteins. We discuss the basis for the algorithm, its limitations, and usage of recently developed online and downloadable versions of the algorithm.
DOI: 10.1128/mcb.01020-14
2015
Cited 32 times
Distinct Amino Acid Compositional Requirements for Formation and Maintenance of the [<i>PSI</i><sup>+</sup>] Prion in Yeast
Multiple yeast prions have been identified that result from the structural conversion of proteins into a self-propagating amyloid form. Amyloid-based prion activity in yeast requires a series of discrete steps. First, the prion protein must form an amyloid nucleus that can recruit and structurally convert additional soluble proteins. Subsequently, maintenance of the prion during cell division requires fragmentation of these aggregates to create new heritable propagons. For the Saccharomyces cerevisiae prion protein Sup35, these different activities are encoded by different regions of the Sup35 prion domain. An N-terminal glutamine/asparagine-rich nucleation domain is required for nucleation and fiber growth, while an adjacent oligopeptide repeat domain is largely dispensable for prion nucleation and fiber growth but is required for chaperone-dependent prion maintenance. Although prion activity of glutamine/asparagine-rich proteins is predominantly determined by amino acid composition, the nucleation and oligopeptide repeat domains of Sup35 have distinct compositional requirements. Here, we quantitatively define these compositional requirements in vivo. We show that aromatic residues strongly promote both prion formation and chaperone-dependent prion maintenance. In contrast, nonaromatic hydrophobic residues strongly promote prion formation but inhibit prion propagation. These results provide insight into why some aggregation-prone proteins are unable to propagate as prions.
DOI: 10.1371/journal.pcbi.1006256
2018
Cited 29 times
Proteome-scale relationships between local amino acid composition and protein fates and functions
Proteins with low-complexity domains continue to emerge as key players in both normal and pathological cellular processes. Although low-complexity domains are often grouped into a single class, individual low-complexity domains can differ substantially with respect to amino acid composition. These differences may strongly influence the physical properties, cellular regulation, and molecular functions of low-complexity domains. Therefore, we developed a bioinformatic approach to explore relationships between amino acid composition, protein metabolism, and protein function. We find that local compositional enrichment within protein sequences is associated with differences in translation efficiency, abundance, half-life, protein-protein interaction promiscuity, subcellular localization, and molecular functions of proteins on a proteome-wide scale. However, local enrichment of related amino acids is sometimes associated with opposite effects on protein regulation and function, highlighting the importance of distinguishing between different types of low-complexity domains. Furthermore, many of these effects are discernible at amino acid compositions below those required for classification as low-complexity or statistically-biased by traditional methods and in the absence of homopolymeric amino acid repeats, indicating that thresholds employed by classical methods may not reflect biologically relevant criteria. Application of our analyses to composition-driven processes, such as the formation of membraneless organelles, reveals distinct composition profiles even for closely related organelles. Collectively, these results provide a unique perspective and detailed insights into relationships between amino acid composition, protein metabolism, and protein functions.
DOI: 10.1093/hmg/ddv627
2016
Cited 28 times
Genetic interaction of hnRNPA2B1 and DNAJB6 in a<i>Drosophila</i>model of multisystem proteinopathy
Adult-onset inherited myopathies with similar pathological features, including hereditary inclusion body myopathy (hIBM) and limb-girdle muscular dystrophy (LGMD), are a genetically heterogeneous group of muscle diseases. It is unclear whether these inherited myopathies initiated by mutations in distinct classes of genes are etiologically related. Here, we exploit a genetic model system to establish a mechanistic link between diseases caused by mutations in two distinct genes, hnRNPA2B1 and DNAJB6. Hrb98DE and mrj are the Drosophila melanogaster homologs of human hnRNPA2B1 and DNAJB6, respectively. We introduced disease-homologous mutations to Hrb98DE, thus capturing mutation-dependent phenotypes in a genetically tractable model system. Ectopic expression of the disease-associated mutant form of hnRNPA2B1 or Hrb98DE in fly muscle resulted in progressive, age-dependent cytoplasmic inclusion pathology, as observed in humans with hnRNPA2B1-related myopathy. Cytoplasmic inclusions consisted of hnRNPA2B1 or Hrb98DE protein in association with the stress granule marker ROX8 and additional endogenous RNA-binding proteins (RBPs), suggesting that these pathological inclusions are related to stress granules. Notably, TDP-43 was also recruited to these cytoplasmic inclusions. Remarkably, overexpression of MRJ rescued this phenotype and suppressed the formation of cytoplasmic inclusions, whereas reduction of endogenous MRJ by a classical loss of function allele enhanced it. Moreover, wild-type, but not disease-associated, mutant forms of MRJ interacted with RBPs after heat shock and prevented their accumulation in aggregates. These results indicate both genetic and physical interactions between disease-linked RBPs and DNAJB6/mrj, suggesting etiologic overlap between the pathogenesis of hIBM and LGMD initiated by mutations in hnRNPA2B1 and DNAJB6.
DOI: 10.1261/rna.079170.122
2022
Cited 10 times
Expansion and functional analysis of the SR-related protein family across the domains of life
Serine/arginine-rich (SR) proteins comprise a family of proteins that is predominantly found in eukaryotes and plays a prominent role in RNA splicing. A characteristic feature of SR proteins is the presence of an S/R-rich low-complexity domain (RS domain), often in conjunction with spatially distinct RNA recognition motifs (RRMs). To date, 52 human proteins have been classified as SR or SR-related proteins. Here, using an unbiased series of composition criteria together with enrichment for known RNA binding activity, we identified >100 putative SR-related proteins in the human proteome. This method recovers known SR and SR-related proteins with high sensitivity (∼94%), yet identifies a number of additional proteins with many of the hallmark features of true SR-related proteins. Newly identified SR-related proteins display slightly different amino acid compositions yet similar levels of post-translational modification, suggesting that these new SR-related candidates are regulated in vivo and functionally important. Furthermore, candidate SR-related proteins with known RNA-binding activity (but not currently recognized as SR-related proteins) are nevertheless strongly associated with a variety of functions related to mRNA splicing and nuclear speckles. Finally, we applied our SR search method to all available reference proteomes, and provide maps of RS domains and Pfam annotations for all putative SR-related proteins as a resource. Together, these results expand the set of SR-related proteins in humans, and identify the most common functions associated with SR-related proteins across all domains of life.
DOI: 10.1128/mcb.21.19.6598-6605.2001
2001
Cited 51 times
HMG Proteins and DNA Flexibility in Transcription Activation
The relative stiffness of naked DNA is evident from measured values of longitudinal persistence length (ϳ150 bp) and torsional persistence length (ϳ180 bp).These parameters predict that certain arrangements of eukaryotic transcription activator proteins in gene promoters should be much more effective than others in fostering protein-protein interactions with the basal RNA polymerase II transcription apparatus.Thus, if such interactions require some kind of DNA looping, DNA loop energies should depend sensitively on helical phasing of protein binding sites, loop size, and intrinsic DNA curvature within the loop.Using families of artificial transcription templates where these parameters were varied, we were surprised to find that the degree of transcription activation by arrays of Gal4-VP1 transcription activators in HeLa cell nuclear extract was sensitive only to the linear distance separating a basal promoter from an array of bound activators on DNA templates.We now examine the hypothesis that this unexpected result is due to factors in the extract that act to enhance apparent DNA flexibility.We demonstrate that HeLa cell nuclear extract is rich in a heat-resistant activity that dramatically enhances apparent DNA longitudinal and torsional flexibility.Recombinant mammalian high-mobility group 2 (HMG-2) protein can substitute for this activity.We propose that the abundance of HMG proteins in eukaryotic nuclei provides an environment in which DNA is made sufficiently flexible to remove many constraints on protein binding site arrangements that would otherwise limit efficient transcription activation to certain promoter geometries.
DOI: 10.4161/pri.4.2.12190
2010
Cited 34 times
The effects of amino acid composition on yeast prion formation and prion domain interactions
Yeast prions provide a powerful model system for examining prion formation and propagation in vivo. Yeast prion formation is driven primarily by amino acid composition, not by primary amino acid sequence. However, although yeast prion domains are consistently glutamine/asparagine-rich, they otherwise vary significantly in their compositions. Therefore, elucidating the exact compositional requirements for yeast prion formation has proven challenging. We have developed an in vivo method that allows for estimation of the prion propensity of each amino acid within the context of a yeast prion domain.1 Using these values, we are able to predict the prion-propensity of various glutamine/asparagine-rich yeast domains. These results provide insight into the basis for yeast prion formation, and may aid in the discovery of additional novel prion domains. Additionally, we examined whether amino acid composition could drive interactions between heterologous glutamine/asparagine-rich proteins.2 Although inefficient interactions between yeast prion domains have previously been observed, we found that one prion protein, Ure2, is able to interact with compositionally similar domains with unprecedented efficiency. This observation, combined with the growing number of yeast prions, suggests that a broad network of interactions between heterologous glutamine/asparagine-rich proteins may affect yeast prion formation.
DOI: 10.1371/journal.pone.0021953
2011
Cited 31 times
[PSI+] Maintenance Is Dependent on the Composition, Not Primary Sequence, of the Oligopeptide Repeat Domain
[PSI+], the prion form of the yeast Sup35 protein, results from the structural conversion of Sup35 from a soluble form into an infectious amyloid form. The infectivity of prions is thought to result from chaperone-dependent fiber cleavage that breaks large prion fibers into smaller, inheritable propagons. Like the mammalian prion protein PrP, Sup35 contains an oligopeptide repeat domain. Deletion analysis indicates that the oligopeptide repeat domain is critical for [PSI+] propagation, while a distinct region of the prion domain is responsible for prion nucleation. The PrP oligopeptide repeat domain can substitute for the Sup35 oligopeptide repeat domain in supporting [PSI+] propagation, suggesting a common role for repeats in supporting prion maintenance. However, randomizing the order of the amino acids in the Sup35 prion domain does not block prion formation or propagation, suggesting that amino acid composition is the primary determinant of Sup35's prion propensity. Thus, it is unclear what role the oligopeptide repeats play in [PSI+] propagation: the repeats could simply act as a non-specific spacer separating the prion nucleation domain from the rest of the protein; the repeats could contain specific compositional elements that promote prion propagation; or the repeats, while not essential for prion propagation, might explain some unique features of [PSI+]. Here, we test these three hypotheses and show that the ability of the Sup35 and PrP repeats to support [PSI+] propagation stems from their amino acid composition, not their primary sequences. Furthermore, we demonstrate that compositional requirements for the repeat domain are distinct from those of the nucleation domain, indicating that prion nucleation and propagation are driven by distinct compositional features.
DOI: 10.1093/nargab/lqab048
2021
Cited 16 times
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
Abstract Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.
DOI: 10.1371/journal.pone.0089286
2014
Cited 23 times
Increasing Prion Propensity by Hydrophobic Insertion
Prion formation involves the conversion of proteins from a soluble form into an infectious amyloid form. Most yeast prion proteins contain glutamine/asparagine-rich regions that are responsible for prion aggregation. Prion formation by these domains is driven primarily by amino acid composition, not primary sequence, yet there is a surprising disconnect between the amino acids thought to have the highest aggregation propensity and those that are actually found in yeast prion domains. Specifically, a recent mutagenic screen suggested that both aromatic and non-aromatic hydrophobic residues strongly promote prion formation. However, while aromatic residues are common in yeast prion domains, non-aromatic hydrophobic residues are strongly under-represented. Here, we directly test the effects of hydrophobic and aromatic residues on prion formation. Remarkably, we found that insertion of as few as two hydrophobic residues resulted in a multiple orders-of-magnitude increase in prion formation, and significant acceleration of in vitro amyloid formation. Thus, insertion or deletion of hydrophobic residues provides a simple tool to control the prion activity of a protein. These data, combined with bioinformatics analysis, suggest a limit on the number of strongly prion-promoting residues tolerated in glutamine/asparagine-rich domains. This limit may explain the under-representation of non-aromatic hydrophobic residues in yeast prion domains. Prion activity requires not only that a protein be able to form prion fibers, but also that these fibers be cleaved to generate new independently-segregating aggregates to offset dilution by cell division. Recent studies suggest that aromatic residues, but not non-aromatic hydrophobic residues, support the fiber cleavage step. Therefore, we propose that while both aromatic and non-aromatic hydrophobic residues promote prion formation, aromatic residues are favored in yeast prion domains because they serve a dual function, promoting both prion formation and chaperone-dependent prion propagation.
DOI: 10.1073/pnas.1501072112
2015
Cited 22 times
Generating new prions by targeted mutation or segment duplication
Yeasts contain various protein-based genetic elements, termed prions, that result from the structural conversion of proteins into self-propagating amyloid forms. Most yeast prion proteins contain glutamine/asparagine (Q/N)-rich prion domains that drive prion activity. Here, we explore two mechanisms by which new prion domains could evolve. First, it has been proposed that mutation and natural selection will tend to result in proteins with aggregation propensities just low enough to function under physiological conditions and thus that a small number of mutations are often sufficient to cause aggregation. We hypothesized that if the ability to form prion aggregates was a sufficiently generic feature of Q/N-rich domains, many nonprion Q/N-rich domains might similarly have aggregation propensities on the edge of prion formation. Indeed, we tested four yeast Q/N-rich domains that had no detectable aggregation activity; in each case, a small number of rationally designed mutations were sufficient to cause the proteins to aggregate and, for two of the domains, to create prion activity. Second, oligopeptide repeats are found in multiple prion proteins, and expansion of these repeats increases prion activity. However, it is unclear whether the effects of repeat expansion are unique to these specific sequences or are a generic result of adding additional aggregation-prone segments into a protein domain. We found that within nonprion Q/N-rich domains, repeating aggregation-prone segments in tandem was sufficient to create prion activity. Duplication of DNA elements is a common source of genetic variation and may provide a simple mechanism to rapidly evolve prion activity.
DOI: 10.1371/journal.pgen.1007517
2018
Cited 19 times
Sequence features governing aggregation or degradation of prion-like proteins
Enhanced protein aggregation and/or impaired clearance of aggregates can lead to neurodegenerative disorders such as Alzheimer's Disease, Huntington's Disease, and prion diseases. Therefore, many protein quality control factors specialize in recognizing and degrading aggregation-prone proteins. Prions, which generally result from self-propagating protein aggregates, must therefore evade or outcompete these quality control systems in order to form and propagate in a cellular context. We developed a genetic screen in yeast that allowed us to explore the sequence features that promote degradation versus aggregation of a model glutamine/asparagine (Q/N)-rich prion domain from the yeast prion protein, Sup35, and two model glycine (G)-rich prion-like domains from the human proteins hnRNPA1 and hnRNPA2. Unexpectedly, we found that aggregation propensity and degradation propensity could be uncoupled in multiple ways. First, only a subset of classically aggregation-promoting amino acids elicited a strong degradation response in the G-rich prion-like domains. Specifically, large aliphatic residues enhanced degradation of the prion-like domains, whereas aromatic residues promoted prion aggregation without enhancing degradation. Second, the degradation-promoting effect of aliphatic residues was suppressed in the context of the Q/N-rich prion domain, and instead led to a dose-dependent increase in the frequency of spontaneous prion formation. Degradation suppression correlated with Q/N content of the surrounding prion domain, potentially indicating an underappreciated activity for these residues in yeast prion domains. Collectively, these results provide key insights into how certain aggregation-prone proteins may evade protein quality control degradation systems.
DOI: 10.1016/j.jbc.2024.106242
2024
Abstract 2457 Investigation of the role of divalent metals in the RquA-catalyzed synthesis of rhodoquinone
Rhodoquinone (RQ) is an essential electron carrier used in anaerobic metabolism by select bacteria, protists and animal species. The RquA protein is required for the conversion of ubiquinone (UQ) to RQ in microbes possessing the rquA gene and may be a target for anti-microbial treatments. This reaction requires S-adenosyl-L-methionine (SAM) as an amino donor and activity is enhanced by divalent metal cations in vitro. It is currently unclear whether a metal plays a structural role or is used directly in the amino-transfer reaction mechanism.
DOI: 10.1016/j.ymeth.2006.04.008
2006
Cited 25 times
Reporter assay systems for [URE3] detection and analysis
The Saccharomyces cerevisiae prion [URE3] is the infectious amyloid form of the Ure2p protein. [URE3] provides a useful model system for studying amyloid formation and stability in vivo. When grown in the presence of a good nitrogen source, [URE3] cells are able to take up ureidosuccinate, an intermediate in uracil biosynthesis, while cells lacking the [URE3] prion can not. This ability to take up ureidosuccinate has been commonly used to assay for the presence of [URE3]. However, this assay has a number of practical limitations, affecting the range of experiments that can be performed with [URE3]. Here, we describe recently developed alternative selection methods for the presence or absence of [URE3]. They make use of the Ure2p-regulated DAL5 promoter in conjunction with ADE2, URA3, kanMX, and CAN1 reporter genes, and allow for higher stringency in selection both for and against [URE3], nonselective assay of prion variants, and direct transformation of prion filaments. We discuss advantages and limitations of each of these assays.
DOI: 10.1371/journal.pcbi.1011372
2024
Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and functions across the domains of life
Low-complexity domains (LCDs) in proteins are typically enriched in one or two predominant amino acids. As a result, LCDs often exhibit unusual structural/biophysical tendencies and can occupy functional niches. However, for each organism, protein sequences must be compatible with intracellular biomolecules and physicochemical environment, both of which vary from organism to organism. This raises the possibility that LCDs may occupy sequence spaces in select organisms that are otherwise prohibited in most organisms. Here, we report a comprehensive survey and functional analysis of LCDs in all known reference proteomes (&gt;21k organisms), with added focus on rare and unusual types of LCDs. LCDs were classified according to both the primary amino acid and secondary amino acid in each LCD sequence, facilitating detailed comparisons of LCD class frequencies across organisms. Examination of LCD classes at different depths (i.e., domain of life, organism, protein, and per-residue levels) reveals unique facets of LCD frequencies and functions. To our surprise, all 400 LCD classes occur in nature, although some are exceptionally rare. A number of rare classes can be defined for each domain of life, with many LCD classes appearing to be eukaryote-specific. Certain LCD classes were consistently associated with identical functions across many organisms, particularly in eukaryotes. Our analysis methods enable simultaneous, direct comparison of all LCD classes between individual organisms, resulting in a proteome-scale view of differences in LCD frequencies and functions. Together, these results highlight the remarkable diversity and functional specificity of LCDs across all known life forms.
DOI: 10.4161/pri.17918
2011
Cited 16 times
Strategies for identifying new prions in yeast
The unexpected discovery of two prions, [URE3] and [PSI+], in Saccharomyces cerevisiae led to questions about how many other proteins could undergo similar prion-based structural conversions. However, [URE3] and [PSI+] were discovered by serendipity in genetic screens. Cataloging the full range of prions in yeast or in other organisms will therefore require more systematic search methods. Taking advantage of some of the unique features of prions, various researchers have developed bioinformatic and experimental methods for identifying novel prion proteins. These methods have generated long lists of prion candidates. The systematic testing of some of these prion candidates has led to notable successes; however, even in yeast, where rapid growth rate and ease of genetic manipulation aid in testing for prion activity, such candidate testing is laborious. Development of better methods to winnow the field of prion candidates will greatly aid in the discovery of new prions, both in yeast and in other organisms, and help us to better understand the role of prions in biology.
DOI: 10.1080/19336896.2017.1344806
2017
Cited 13 times
The effects of glutamine/asparagine content on aggregation and heterologous prion induction by yeast prion-like domains
Prion-like domains are low complexity, intrinsically disordered domains that compositionally resemble yeast prion domains. Many prion-like domains are involved in the formation of either functional or pathogenic protein aggregates. These aggregates range from highly dynamic liquid droplets to highly ordered detergent-insoluble amyloid-like aggregates. To better understand the amino acid sequence features that promote conversion to stable, detergent-insoluble aggregates, we used the prediction algorithm PAPA to identify predicted aggregation-prone prion-like domains with a range of compositions. While almost all of the predicted aggregation-prone domains formed foci when expressed in cells, the ability to form the detergent-insoluble aggregates was highly correlated with glutamine/asparagine (Q/N) content, suggesting that high Q/N content may specifically promote conversion to the amyloid state in vivo. We then used this data set to examine cross-seeding between prion-like proteins. The prion protein Sup35 requires the presence of a second prion, [PIN+], to efficiently form prions, but this requirement can be circumvented by the expression of various Q/N-rich protein fragments. Interestingly, almost all of the Q/N-rich domains that formed SDS-insoluble aggregates were able to promote prion formation by Sup35, highlighting the highly promiscuous nature of these interactions.
DOI: 10.1006/jmbi.2000.3562
2000
Cited 22 times
DNA constraints on transcription activation in Vitro
Activators of eukaryotic transcription often function over a range of distances. It is commonly hypothesized that the intervening DNA between the transcription start site and the activator binding sites forms a loop in order to allow the activators to interact with the basal transcription apparatus, either directly or through mediators. If this hypothesis is correct, activation should be sensitive to the presence of intrinsic bends in the intervening DNA. Similarly, the precise helical phasing of such DNA bends and of the activator binding sites relative to the basal promoter should affect the degree of transcription activation. To explore these considerations, we designed transcription templates based on the adenovirus E4 promoter supplemented with upstream Gal4 activator binding sites. Surprisingly, we found that neither insertion of intrinsically curved DNA sequences between the activator binding sites and the basal promoter, nor alteration of the relative helical alignment of the activator binding sites and the basal promoter significantly affected in vitro transcription activation in HeLa cell nuclear extract. In all cases, the degree of transcription activation was a simple inverse function of the length of intervening DNA. Possible implications of these unexpected results are discussed.
DOI: 10.1534/genetics.109.109322
2009
Cited 14 times
A Promiscuous Prion: Efficient Induction of [URE3] Prion Formation by Heterologous Prion Domains
The [URE3] and [PSI(+)] prions are the infections amyloid forms of the Saccharomyces cerevisiae proteins Ure2p and Sup35p, respectively. Randomizing the order of the amino acids in the Ure2 and Sup35 prion domains while retaining amino acid composition does not block prion formation, indicating that amino acid composition, not primary sequence, is the predominant feature driving [URE3] and [PSI(+)] formation. Here we show that Ure2p promiscuously interacts with various compositionally similar proteins to influence [URE3] levels. Overexpression of scrambled Ure2p prion domains efficiently increases de novo formation of wild-type [URE3] in vivo. In vitro, amyloid aggregates of the scrambled prion domains efficiently seed wild-type Ure2p amyloid formation, suggesting that the wild-type and scrambled prion domains can directly interact to seed prion formation. To test whether interactions between Ure2p and naturally occurring yeast proteins could similarly affect [URE3] formation, we identified yeast proteins with domains that are compositionally similar to the Ure2p prion domain. Remarkably, all but one of these domains were also able to efficiently increase [URE3] formation. These results suggest that a wide variety of proteins could potentially affect [URE3] formation.
DOI: 10.1016/j.semcdb.2011.02.022
2011
Cited 9 times
Interactions between non-identical prion proteins
Prion formation involves the conversion of soluble proteins into an infectious amyloid form. This process is highly specific, with prion aggregates templating the conversion of identical proteins. However, in some cases non-identical prion proteins can interact to promote or inhibit prion formation or propagation. These interactions affect both the efficiency with which prion diseases are transmitted across species and the normal physiology of yeast prion formation and propagation. Here we examine two types of heterologous prion interactions: interactions between related proteins from different species (the species barrier) and interactions between unrelated prion proteins within a single species. Interestingly, although very subtle changes in protein sequence can significantly reduce or eliminate cross-species prion transmission, in Saccharomyces cerevisiae completely unrelated prion proteins can interact to affect prion formation and propagation.
DOI: 10.1080/19336896.2017.1356560
2017
Cited 8 times
Manipulating the aggregation activity of human prion-like proteins
Considerable advances in understanding the protein features favoring prion formation in yeast have facilitated the development of effective yeast prion prediction algorithms. Here we discuss a recent study in which we systematically explored the utility of the yeast prion prediction algorithm PAPA for designing mutations to modulate the aggregation activity of the human prion-like protein hnRNPA2B1. Mutations in hnRNPA2B1 cause multisystem proteinopathy in humans, and accelerate aggregation of the protein in vitro. Additionally, mutant hnRNPA2B1 forms cytoplasmic inclusions when expressed in Drosophila, and the mutant prion-like domain can substitute for a portion of a yeast prion domain in supporting prion activity in yeast. PAPA was quite successful at predicting the effects of PrLD mutations on prion activity in yeast and on in vitro aggregation propensity. Additionally, PAPA successfully predicted the effects of most, but not all, mutations in the PrLD of the hnRNPA2B1 protein when expressed in Drosophila. These results suggest that PAPA is quite effective at predicting the effects of mutations on intrinsic aggregation propensity, but that intracellular factors can influence aggregation and prion-like activity in vivo. A more complete understanding of these intracellular factors may inform the next generation of prion prediction algorithms.
DOI: 10.1093/bioinformatics/btac699
2022
Cited 4 times
The LCD-Composer webserver: high-specificity identification and functional analysis of low-complexity domains in proteins
Abstract Summary Low-complexity domains (LCDs) in proteins are regions enriched in a small subset of amino acids. LCDs exist in all domains of life, often have unusual biophysical behavior, and function in both normal and pathological processes. We recently developed an algorithm to identify LCDs based predominantly on amino acid composition thresholds. Here, we have integrated this algorithm with a webserver and augmented it with additional analysis options. Specifically, users can (i) search for LCDs in whole proteomes by setting minimum composition thresholds for individual or grouped amino acids, (ii) submit a known LCD sequence to search for similar LCDs, (iii) search for and plot LCDs within a single protein, (iv) statistically test for enrichment of LCDs within a user-provided protein set and (v) specifically identify proteins with multiple types of LCDs. Availability and implementation The LCD-Composer server can be accessed at http://lcd-composer.bmb.colostate.edu. The corresponding command-line scripts can be accessed at https://github.com/RossLabCSU/LCD-Composer/tree/master/WebserverScripts.
DOI: 10.1016/b978-0-323-99533-7.00014-5
2023
The roles of prion-like domains in amyloid formation, phase separation, and solubility
Prion-like domains (PrLDs) are low-complexity protein domains with compositional similarity to yeast prion domains. PrLDs are highly enriched in small and polar amino acids and therefore tend to be intrinsically disordered. They are common in eukaryotic proteomes, and mutations in these domains are linked to various degenerative disorders, including amyotrophic lateral sclerosis and frontotemporal dementia. PrLDs have been proposed to contribute to the formation of diverse molecular assemblies, ranging from highly ordered amyloid aggregates to dynamic, multicomponent biomolecular condensates. PrLDs have also been proposed to help proteins maintain solubility in fluctuating environments. In this chapter, we will examine these diverse activities of PrLDs.
DOI: 10.20944/preprints202306.0042.v1
2023
Principles of Artificial Neural Networks and Machine Learning for Bioinformatics Applications
With the exponential growth of machine learning and development of Artificial Neural Network (ANNs) in recent years, there is great opportunity to leverage this approach and accelarate bio-logical discoveries through applications on the analysis of bioinformatics data. Various types of datasets including for example protein or gene interaction networks, molecular structures and cellular signalling pathways, have already been used for machine learning by training ANNs for inference and pattern classification. However, unlike regular data structures that are commonly used in the computer science and engineering fields, bioinformatics datasets present challenges that require unique algorithmic approaches. The recent development of the geometric and deep learning approach within the machine learning field, is very promising towards accelerating analysis complex bioinformatics datasets. The principles of ANNs and their importance for bio-informatics machine learning is demonstrated herein, through presentation of the undelying mathematical and statistical foundations from group theory, symmetry, linear algebra. Further-more, the structure and functions of ANN algorithms that form the core principles of artificial intelligence are explained, in relation to the bioinformatics data domain. Overall, the manuscript provides guidance for researchers to understand the principles required for practicing machine learning and artificial intelligence, with the special considerations towards bioinformatics applications.
DOI: 10.5281/zenodo.8155290
2023
Low-Complexity Domains (LCDs) in UniProt Reference Proteomes
This is a comprehensive dataset of low-complexity domains in UniProt reference proteomes. For the purposes of this dataset, LCDs were identified using the LCD-Composer algorithm with default parameters for each of the 20 canonical amino acids. These searches identify "primary" LCDs, defined as protein regions for which a single type of amino acid comprises at least 40% of the region. In addition, separate searches were performed to identify "secondary" LCDs, which are defined as regions for which a single type of amino acid comprises at least 40% of the region <em><strong>and</strong></em> a second type of amino acid comprises at least 20% of the same region. Note that secondary LCDs exhibit very strong spatial overlap with primary LCDs and may be considered, approximately speaking, a subset of primary LCDs. There are seven main components to this dataset: Primary and secondary LCDs for the original reference proteomes from UniProt (downloaded 8/22/2022). These data are found within four zipped archives ending in "_LCDs.zip", one for each domain of life (Archaea, Bacteria, Eukaryota, and Viruses). Within each zipped archive, results are contained in a pair of files for each organism. The start of the file name is the organism's UniProt ID. For each organism, the pair of files are: primary LCDs are contained within the file ending in "_LCDcomposer_RESULTS.tsv", whereas secondary LCDs are contained within the file ending in "_LCDcomposer_SecondaryLCDs_RESULTS.tsv". Reference proteomes analyzed for each organism are also provided in separate zipped archives, one for each domain of life. Primary and secondary LCDs for a scrambled version of each proteome mentioned above. These searches were performed using identical search parameters and are included for statistical comparisons. When scrambling the proteomes, each protein sequence was scrambled individually to maintain its amino acid composition. File formats are identical to those described above except that all files will have "SCRAMBLED" in the name to distinguish them from analyses of original (i.e. native) proteomes. The "SecondaryLCDs_by_LCDcategory.zip" archive contains all secondary LCDs from the original proteomes but parsed by LCD category rather than by organism. These LCDs are identical to those in #1 above but are provided in this format to aid those interested in specific types of LCDs and which organisms contain them. The "GOAfiles.zip" archive contains gene ontology files necessary for reproducing analyses in Cascarina and Ross (2023). The "Pfam_Data.zip" archive contains files with Pfam annotations in LCD-containing proteins and Pfam clan information. These files are necessary for reproducing analyses in Cascarina and Ross (2023). The "Observed_vs_Scrambled_LCDfrequency_Statistics.zip" archive contains results of statistical analyses of LCD enrichment or depletion in native ("Observed") proteomes compared to scrambled proteomes. Enrichment is defined as the native proteome having more LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. Depletion is defined as the native having containing fewer LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. In cases were 0 instances of an LCD class occurred in the native proteome, scrambled proteome, or both, biased estimates for the natural log of the odds ratio ("lnOR") and p-value were calculated by first adding 1 to all cells in the contingency table. The "GOterm_Results_C-rich_LCDs_ModelOrganisms.zip" archive contains GO-term results for proteins with C-rich domains in model eukaryotic organisms. GO-term analyses were performed for all C-rich LCD classes (regardless of whether C is the primary or secondary amino acid) associated with 5 or more proteins within a given model organism. Results from each GO-term analysis were output to separate files, with the file name containing the UniProtID and abbreviated name of the corresponding organism, as well as the name of the C-rich LCD class (CX or XC, where “X” is any non-C amino acid).
DOI: 10.1534/genetics.111.135285
2012
Cited 6 times
An Integrated Biochemistry and Genetics Outreach Program Designed for Elementary School Students
Abstract Exposure to genetic and biochemical experiments typically occurs late in one’s academic career. By the time students have the opportunity to select specialized courses in these areas, many have already developed negative attitudes toward the sciences. Given little or no direct experience with the fields of genetics and biochemistry, it is likely that many young people rule these out as potential areas of study or career path. To address this problem, we developed a 7-week (∼1 hr/week) hands-on course to introduce fifth grade students to basic concepts in genetics and biochemistry. These young students performed a series of investigations (ranging from examining phenotypic variation, in vitro enzymatic assays, and yeast genetic experiments) to explore scientific reasoning through direct experimentation. Despite the challenging material, the vast majority of students successfully completed each experiment, and most students reported that the experience increased their interest in science. Additionally, the experiments within the 7-week program are easily performed by instructors with basic skills in biological sciences. As such, this program can be implemented by others motivated to achieve a broader impact by increasing the accessibility of their university and communicating to a young audience a positive impression of the sciences and the potential for science as a career.
DOI: 10.1080/19336896.2015.1111506
2015
Cited 5 times
Controlling the prion propensity of glutamine/asparagine-rich proteins
The yeast Saccharomyces cerevisiae can harbor a number of distinct prions. Most of the yeast prion proteins contain a glutamine/asparagine (Q/N) rich region that drives prion formation. Prion-like domains, defined as regions with high compositional similarity to yeast prion domains, are common in eukaryotic proteomes, and mutations in various human proteins containing prion-like domains have been linked to degenerative diseases, including amyotrophic lateral sclerosis. Here, we discuss a recent study in which we utilized two strategies to generate prion activity in non-prion Q/N-rich domains. First, we made targeted mutations in four non-prion Q/N-rich domains, replacing predicted prion-inhibiting amino acids with prion-promoting amino acids. All four mutants formed foci when expressed in yeast, and two acquired bona fide prion activity. Prion activity could be generated with as few as two mutations, suggesting that many non-prion Q/N-rich proteins may be just a small number of mutations from acquiring aggregation or prion activity. Second, we created tandem repeats of short prion-prone segments, and observed length-dependent prion activity. These studies demonstrate the considerable progress that has been made in understanding the sequence basis for aggregation of prion and prion-like domains, and suggest possible mechanisms by which new prion domains could evolve.
DOI: 10.1186/s12864-019-6425-3
2020
Cited 5 times
Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Abstract Background Impaired proteostatic regulation of proteins with prion-like domains (PrLDs) is associated with a variety of human diseases including neurodegenerative disorders, myopathies, and certain forms of cancer. For many of these disorders, current models suggest a prion-like molecular mechanism of disease, whereby proteins aggregate and spread to neighboring cells in an infectious manner. The development of prion prediction algorithms has facilitated the large-scale identification of PrLDs among “reference” proteomes for various organisms. However, the degree to which intraspecies protein sequence diversity influences predicted prion propensity has not been systematically examined. Results Here, we explore protein sequence variation introduced at genetic, post-transcriptional, and post-translational levels, and its influence on predicted aggregation propensity for human PrLDs. We find that sequence variation is relatively common among PrLDs and in some cases can result in relatively large differences in predicted prion propensity. Sequence variation introduced at the post-transcriptional level (via alternative splicing) also commonly affects predicted aggregation propensity, often by direct inclusion or exclusion of a PrLD. Finally, analysis of a database of sequence variants associated with human disease reveals a number of mutations within PrLDs that are predicted to increase prion propensity. Conclusions Our analyses expand the list of candidate human PrLDs, quantitatively estimate the effects of sequence variation on the aggregation propensity of PrLDs, and suggest the involvement of prion-like mechanisms in additional human diseases.
DOI: 10.1016/s0020-1693(98)00380-6
1999
Cited 13 times
The interaction of BMXD and its copper(II) complexes with glycine, aspartic acid, and histidine
Abstract The macrocycle, 3,6,9,17,20,23-hexaazatricyclo[23.3.1.1 11,15 ]triaconta-1(29)11(30),12,14,25,27-hexaene, BMXD, is shown to recognize three amino acids, glycine, aspartic acid, and histidine, to form binary species. The mono- and dinuclear copper(II) complexes are also shown to host these amino acids. The stability constants for the binary complexes of the amino acids with the macrocycle, and of the ternary complexes containing amino acid, copper(II) and macrocycle, are reported, and binding schemes are suggested for the recognition of glycine, and for the dinuclear ternary species with histidine and glycine. Aspartic acid is found to form the most stable complexes, both with and without the presence of copper(II) ion.
DOI: 10.1080/07391102.2000.10506660
2000
Cited 12 times
Relating Independent Measures of DNA Curvature: Electrophoretic Anomaly and Cyclization Efficiency
Abstract Electrophoretic methods are often used to measure DNA curvature and protein-induced DNA bending. Though convenient and widely-applied, quantitative analyses are generally limited to assays for which empirical calibration standards have been developed. Alternatively, solution-based cyclization of short DNA duplexes allows analysis of DNA curvature and bending from first principles, but a detailed understanding of this assay is still lacking. In this work, we demonstrate that calibration with an independent electrophoretic assay of DNA curvature permits interpretation of cyclization assay results in a quantitatively meaningful way. We systematically measure intrinsic DNA curvature in short duplexes using a well-established empirical ligation ladder assay. We then compare the results to those obtained from the analysis of the distribution of circular products obtained in simple enzymatic cyclization assays of the same duplexes when polymerized. A strong correlation between DNA curvature estimates from these two assays is obtained for DNA fragments between 150–300 bp in length. We discuss how this result might be used to improve quantitative analysis of protein-mediated bending events evaluated by cyclization methods. Our results suggest that measurements of DNA curvature obtained under similar conditions, in solution and in an acrylamide gel matrix, can be compared directly. The ability to correlate results of these simple assays may prove convenient in monitoring DNA curvature and flexibility.
DOI: 10.1016/s0968-0004(01)02020-5
2001
Cited 11 times
Prions beget prions: the [PIN+] mystery!
Prions are infectious proteins. [PIN+] is a non-chromosomal genetic element of Saccharomyces cerevisiae that is necessary for the de novo induction of the [PSI+] prion. Recently, [PIN+] has been found to be itself a prion of the Rnq1 protein. [URE3], another yeast prion, can also promote [PSI+] generation. Thus, one prion can promote the generation of another.
DOI: 10.1101/sqb.2004.69.489
2004
Cited 9 times
Prions of Yeast Are Genes Made of Protein: Amyloids and Enzymes
In 1994 we described two infectious proteins (prions)of Saccharomyces cerevisiae, showing that the nonchromosomal genes [URE3] and [PSI] are inactive, self-propagating forms of Ure2p and Sup35p, respectively (Wickner 1994). Since then, [Het-s], a prion of Podosporaanserina (Coustou et al. 1997), and [PIN+], another S.cerevisiae prion (Derkatch et al. 1997, 2001; Sondheimerand Lindquist 2000), have been found. All of the aboveprions are based on self-propagating amyloids. Recently,we described another prion, called [β], which is based onthe trans self-activation by the yeast vacuolar protease B(Roberts and Wickner 2003). Evidence has appeared suggesting that C, the non-Mendelian gene of Podosporaanserina determining Crippled Growth, is also a self-activating prion of a MAP kinase kinase kinase (Kicka andSilar 2004). Study of these yeast and fungal phenomenahas established that proteins can be genes and infectiousentities and has revealed many details of what makes aprotein infectious...
DOI: 10.1002/yea.1143
2004
Cited 5 times
Prions of yeast fail to elicit a transcriptional response
Amyloid deposits are associated with numerous human diseases. The [URE3] prion of Saccharomyces is an infectious, inactive, amyloid form of the Ure2p protein. Despite the presence of large prion aggregates in [URE3] yeast, the only apparent phenotypes associated with the prion are attributable to loss of Ure2p function. We used cDNA microarrays to look for genes in yeast that are differentially expressed in the presence of the [URE3] prion and which might act to mitigate the detrimental effects of the prion aggregates. On comparing [URE3] vs. ure2 yeast, we were surprised to find that the only expression changes detected were attributable to the low level of residual Ure2p activity in the [URE3] cells. Interestingly, in addition to repressing the activity of genes required for utilization of poor nitrogen sources when yeast are grown in the presence of a good nitrogen source, Ure2p appears to be involved in stimulating some of these same genes in the absence of a good nitrogen source.
DOI: 10.1007/s00294-018-0890-0
2018
Aggregation and degradation scales for prion-like domains: sequence features and context weigh in
DOI: 10.1016/b978-0-323-99533-7.00020-0
2023
List of contributors
DOI: 10.1101/2023.07.24.550260
2023
Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across Reference Proteomes
Abstract Low-complexity domains (LCDs) in proteins are typically enriched in one or two predominant amino acids. As a result, LCDs often exhibit unusual structural/biophysical tendencies and can occupy functional niches. However, for each organism, protein sequences must be compatible with intracellular biomolecules and physicochemical environment, both of which vary from organism to organism. This raises the possibility that LCDs may occupy sequence spaces in select organisms that are otherwise prohibited in most organisms. Here, we report a comprehensive survey of LCDs in all known reference proteomes (&gt;21k organisms), with added focus on rare and unusual types of LCDs. LCDs were sorted into a rich hierarchical database according to both the primary amino acid and secondary amino acid in each LCD sequence, facilitating detailed comparisons of LCD class frequencies across organisms. Examination of LCD classes at different depths (i.e., domain of life, organism, protein, and per- residue levels) reveals unique facets of LCD frequencies and functions. To our surprise, all 400 LCD classes occur in nature, although some are exceptionally rare. A number of rare classes can be defined for each domain of life, with many LCD classes appearing to be eukaryote- specific. Multiple eukaryote-specific LCD classes could be linked to consistent sets of functions across organisms. Our analysis methods enable simultaneous, direct comparison of all LCD classes between individual organisms, resulting in a proteome-scale view of differences in LCD frequencies and functions. Together, these results highlight the remarkable diversity and functional specificity of LCDs across all known life forms.
DOI: 10.5281/zenodo.8155289
2023
Low-Complexity Domains (LCDs) in UniProt Reference Proteomes
This is a comprehensive dataset of low-complexity domains in UniProt reference proteomes. For the purposes of this dataset, LCDs were identified using the LCD-Composer algorithm with default parameters for each of the 20 canonical amino acids. These searches identify "primary" LCDs, defined as protein regions for which a single type of amino acid comprises at least 40% of the region. In addition, separate searches were performed to identify "secondary" LCDs, which are defined as regions for which a single type of amino acid comprises at least 40% of the region and a second type of amino acid comprises at least 20% of the same region. Note that secondary LCDs exhibit very strong spatial overlap with primary LCDs and may be considered, approximately speaking, a subset of primary LCDs. There are seven main components to this dataset: Primary and secondary LCDs for the original reference proteomes from UniProt (downloaded 8/22/2022). These data are found within four zipped archives ending in "_LCDs.zip", one for each domain of life (Archaea, Bacteria, Eukaryota, and Viruses). Within each zipped archive, results are contained in a pair of files for each organism. The start of the file name is the organism's UniProt ID. For each organism, the pair of files are: primary LCDs are contained within the file ending in "_LCDcomposer_RESULTS.tsv", whereas secondary LCDs are contained within the file ending in "_LCDcomposer_SecondaryLCDs_RESULTS.tsv". Reference proteomes analyzed for each organism are also provided in separate zipped archives, one for each domain of life. Primary and secondary LCDs for a scrambled version of each proteome mentioned above. These searches were performed using identical search parameters and are included for statistical comparisons. When scrambling the proteomes, each protein sequence was scrambled individually to maintain its amino acid composition. File formats are identical to those described above except that all files will have "SCRAMBLED" in the name to distinguish them from analyses of original (i.e. native) proteomes. The "SecondaryLCDs_by_LCDcategory.zip" archive contains all secondary LCDs from the original proteomes but parsed by LCD category rather than by organism. These LCDs are identical to those in #1 above but are provided in this format to aid those interested in specific types of LCDs and which organisms contain them. The "GOA_files.zip" archive contains gene ontology files necessary for reproducing analyses in Cascarina and Ross (2023). The "Pfam_Data.zip" archive contains files with Pfam annotations in LCD-containing proteins and Pfam clan information. These files are necessary for reproducing analyses in Cascarina and Ross (2023). The "Observed_vs_Scrambled_LCDfrequency_Statistics.zip" archive contains results of statistical analyses of LCD enrichment or depletion in native ("Observed") proteomes compared to scrambled proteomes. Enrichment is defined as the native proteome having more LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. Depletion is defined as the native having containing fewer LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. In cases were 0 instances of an LCD class occurred in the native proteome, scrambled proteome, or both, biased estimates for the natural log of the odds ratio ("lnOR") and p-value were calculated by first adding 1 to all cells in the contingency table. The "RandomlySelectedOrganisms.zip" archive contains LCD-Composer results for 50 randomly selected organisms from each domain of life with the window size and composition thresholds used during the LCD searches varied systematically.
DOI: 10.5281/zenodo.10369078
2023
Low-Complexity Domains (LCDs) in UniProt Reference Proteomes
This is a comprehensive dataset of low-complexity domains in UniProt reference proteomes. For the purposes of this dataset, LCDs were identified using the LCD-Composer algorithm with default parameters for each of the 20 canonical amino acids. These searches identify "primary" LCDs, defined as protein regions for which a single type of amino acid comprises at least 40% of the region. In addition, separate searches were performed to identify "secondary" LCDs, which are defined as regions for which a single type of amino acid comprises at least 40% of the region and a second type of amino acid comprises at least 20% of the same region. Note that secondary LCDs exhibit very strong spatial overlap with primary LCDs and may be considered, approximately speaking, a subset of primary LCDs. There are seven main components to this dataset: Primary and secondary LCDs for the original reference proteomes from UniProt (downloaded 8/22/2022). These data are found within four zipped archives ending in "_LCDs.zip", one for each domain of life (Archaea, Bacteria, Eukaryota, and Viruses). Within each zipped archive, results are contained in a pair of files for each organism. The start of the file name is the organism's UniProt ID. For each organism, the pair of files are: primary LCDs are contained within the file ending in "_LCDcomposer_RESULTS.tsv", whereas secondary LCDs are contained within the file ending in "_LCDcomposer_SecondaryLCDs_RESULTS.tsv". Reference proteomes analyzed for each organism are also provided in separate zipped archives, one for each domain of life. Primary and secondary LCDs for a scrambled version of each proteome mentioned above. These searches were performed using identical search parameters and are included for statistical comparisons. When scrambling the proteomes, each protein sequence was scrambled individually to maintain its amino acid composition. File formats are identical to those described above except that all files will have "SCRAMBLED" in the name to distinguish them from analyses of original (i.e. native) proteomes. The "SecondaryLCDs_by_LCDcategory.zip" archive contains all secondary LCDs from the original proteomes but parsed by LCD category rather than by organism. These LCDs are identical to those in #1 above but are provided in this format to aid those interested in specific types of LCDs and which organisms contain them. The "GOA_files.zip" archive contains gene ontology files necessary for reproducing analyses in Cascarina and Ross (2023). The "Pfam_Data.zip" archive contains files with Pfam annotations in LCD-containing proteins and Pfam clan information. These files are necessary for reproducing analyses in Cascarina and Ross (2023). The "Observed_vs_Scrambled_LCDfrequency_Statistics.zip" archive contains results of statistical analyses of LCD enrichment or depletion in native ("Observed") proteomes compared to scrambled proteomes. Enrichment is defined as the native proteome having more LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. Depletion is defined as the native having containing fewer LCD-containing proteins for a particular LCD type compared to a scrambled version of that proteome. In cases were 0 instances of an LCD class occurred in the native proteome, scrambled proteome, or both, biased estimates for the natural log of the odds ratio ("lnOR") and p-value were calculated by first adding 1 to all cells in the contingency table. The "RandomlySelectedOrganisms.zip" archive contains LCD-Composer results for 50 randomly selected organisms from each domain of life with the window size and composition thresholds used during the LCD searches varied systematically.
DOI: 10.1101/807438
2019
Atypical Structural Tendencies Among Low-Complexity Domains in the Protein Data Bank Proteome
Abstract A variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment. Author Summary The structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.
DOI: 10.1007/s00294-019-01044-z
2019
Sky1: at the intersection of prion-like proteins and stress granule regulation
Serine‐arginine (SR) protein kinases regulate diverse cellular activities, including various steps in RNA maturation and transport. The yeast Saccharomyces cerevisiae expresses a single SR kinase, Sky1. Sky1 has a bipartite kinase domain, separated by an aggregation-prone prion-like domain (PrLD). The assembly of PrLDs is involved in the formation of various membraneless organelles, including stress granules; stress granules are reversible ribonucleoprotein assemblies that form in response to a variety of stresses. Here, we review a recent study suggesting that Sky1’s PrLD promotes Sky1 recruitment to stress granules, and that Sky1 regulates stress granule dissolution by phosphorylating the RNA-shuttling protein Npl3.
DOI: 10.3390/ijms22168944
2021
Generalizable Compositional Features Influencing the Proteostatic Fates of Polar Low-Complexity Domains
Protein aggregation is associated with a growing list of human diseases. A substantial fraction of proteins in eukaryotic proteomes constitutes a proteostasis network—a collection of proteins that work together to maintain properly folded proteins. One of the overarching functions of the proteostasis network is the prevention or reversal of protein aggregation. How proteins aggregate in spite of the anti-aggregation activity of the proteostasis machinery is incompletely understood. Exposed hydrophobic patches can trigger degradation by the ubiquitin-proteasome system, a key branch of the proteostasis network. However, in a recent study, we found that model glycine (G)-rich or glutamine/asparagine (Q/N)-rich prion-like domains differ in their susceptibility to detection and degradation by this system. Here, we expand upon this work by examining whether the features controlling the degradation of our model prion-like domains generalize broadly to G-rich and Q/N-rich domains. Experimentally, native yeast G-rich domains in isolation are sensitive to the degradation-promoting effects of hydrophobic residues, whereas native Q/N-rich domains completely resist these effects and tend to aggregate instead. Bioinformatic analyses indicate that native G-rich domains from yeast and humans tend to avoid degradation-promoting features, suggesting that the proteostasis network may act as a form of selection at the molecular level that constrains the sequence space accessible to G-rich domains. However, the sensitivity or resistance of G-rich and Q/N-rich domains, respectively, was not always preserved in their native protein contexts, highlighting that proteins can evolve other sequence features to overcome the intrinsic sensitivity of some LCDs to degradation.
DOI: 10.4161/pri.5.4.17918
2011
Strategies for identifying new prions in yeast
The unexpected discovery of two prions, [URE3] and [PSI + ], in Saccharomyces cerevisiae led to questions about how many other proteins could undergo similar prion-based structural conversions.However, [URE3] and [PSI + ] were discovered by serendipity in genetic screens.Cataloging the full range of prions in yeast or in other organisms will therefore require more systematic search methods.Taking advantage of some of the unique features of prions, various researchers have developed bioinformatic and experimental methods for identifying novel prion proteins.These methods have generated long lists of prion candidates.The systematic testing of some of these prion candidates has led to notable successes; however, even in yeast, where rapid growth rate and ease of genetic manipulation aid in testing for prion activity, such candidate testing is laborious.Development of better methods to winnow the field of prion candidates will greatly aid in the discovery of new prions, both in yeast and in other organisms, and help us to better understand the role of prions in biology.
DOI: 10.6084/m9.figshare.c.5118665.v3
2020
Low-Complexity Domains (LCDs) identified by LCD-Composer with default parameters
DOI: 10.1016/j.nhtm.2015.07.042
2015
Predicting Prion Propensity of Human Proteins
In humans only a single prion-forming protein named PrP c (for “ c ellular pr ion p rotein”) is currently known, yet many more neurodegenerative disorders involve aberrant protein aggregation. The classical model for these diseases has involved cell-autonomous aggregation, assuming that aggregation occurs independently in each cell within a diseased patient. However, more recent models have proposed a non-cell-autonomous progression of disease in which aggregates formed in one cell may be transmitted to neighboring cells. These aggregate seeds then cause aggregation of the soluble protein in the “infected” cells, similar to the prion diseases. Within the past few years, a number of proteins that exhibit prion-like aggregation and spread to neighboring tissues have been discovered in patients with Amyotrophic Lateral Sclerosis (ALS). Although ALS has been studied for a number of decades, these proteins were only recently linked to ALS by chance. This demonstrates a clear need for an accurate method to systematically identify additional proteins that may play a pathological role in neurodegenerative disorders. Taking advantage of the compositional similarity of these proteins to the known yeast prions, I plan to use the prion prediction methodology that our lab has pioneered to develop an entirely new algorithm specifically suited for this class of neuronal proteins.
DOI: 10.1101/338202
2018
Proteome-Scale Relationships Between Local Amino Acid Composition and Protein Fates and Functions
Abstract Proteins with low-complexity domains continue to emerge as key players in both normal and pathological cellular processes. Although low-complexity domains are often grouped into a single class, individual low-complexity domains can differ substantially with respect to amino acid composition. These differences may strongly influence the physical properties, cellular regulation, and molecular functions of low-complexity domains. Therefore, we developed a bioinformatic approach to explore relationships between amino acid composition, protein metabolism, and protein function. We find that local compositional enrichment within protein sequences affects the translation efficiency, abundance, half-life, subcellular localization, and molecular functions of proteins on a proteome-wide scale. However, these effects depend upon the type of amino acid enriched in a given sequence, highlighting the importance of distinguishing between different types of low-complexity domains. Furthermore, many of these effects are discernible at amino acid compositions below those required for classification as low-complexity or statistically-biased by traditional methods and in the absence of homopolymeric amino acid repeats, indicating that thresholds employed by classical methods may not reflect biologically relevant criteria. Application of our analyses to composition-driven processes, such as the formation of membraneless organelles, reveals distinct composition profiles even for closely related organelles. Collectively, these results provide a unique perspective and detailed insights into relationships between amino acid composition, protein metabolism, and protein functions. Author Summary Low-complexity domains in protein sequences are regions that are composed of only a few amino acids in the protein “alphabet”. These domains often have unique chemical properties and play important biological roles in both normal and disease-related processes.While a number of approaches have been developed to define low-complexity domains, these methods each possess conceptual limitations. Therefore, we developed a complementary approach that focuses on local amino acid composition (i.e. the amino acid composition within small regions of proteins). We find that high local composition of individual amino acids is associated with pervasive effects on protein metabolism, subcellular localization, and molecular function on a proteome-wide scale. Importantly, the nature of the effects depend on the type of amino acid enriched within the examined domains, and are observable in the absence of classically-defined low-complexity (and related) domains. Furthermore, we define the compositions of proteins involved in the formation of membraneless, protein-rich organelles such as stress granules and P-bodies. Our results provide a coherent view and unprecedented resolution of the effects of local amino acid enrichment on protein biology.
DOI: 10.1016/j.bpj.2021.11.628
2022
Defining the compositional requirements for recruitment of prion-like domains to stress granules
Stress granules are ribonucleoprotein assemblies formed by cells in response various stresses. Many of the proteins within stress granules contain prion-like domains, which are low-complexity protein domains that compositionally resemble yeast prion domains. Mutations in some of these prion-like domains have been linked to degenerative disorders, including ALS. We are using yeast stress granules as a model to examine how the sequence and composition of prion-like domains affect their recruitment into complex biomolecular condensates. We performed a screen of prion-like domains, and found that many are sufficient for recruitment into stress granules. This recruitment is driven largely by amino acid composition. Interestingly, the compositional features that drive stress granule recruitment are distinct from those that drive both amyloid formation and liquid-liquid phase separation by isolated prion-like domains. Using a simple composition-based algorithm, we were able to design synthetic prion-like domains that are efficiently targeted to stress granules. We are now leveraging these synthetic prion-like domains to more rigorously define the interactions involved in recruitment of prion-like domains to stress granules. We find that both the length and compositional requirements for stress granule recruitment are highly flexible, and that the degree of partitioning between stress granules and the cytoplasm can be titrated over a broad range through simple targeted mutations.
DOI: 10.1101/626648
2019
Natural and Pathogenic Protein Sequence Variation Affecting Prion-Like Domains Within and Across Human Proteomes
ABSTRACT Protein aggregation is involved in a variety of muscular and neurodegenerative disorders. For many of these disorders, current models suggest a prion-like molecular mechanism of disease, whereby proteins aggregate and spread to neighboring cells in an infectious manner. A variety of proteins with prion-like domains (PrLDs) have recently been linked to these disorders. The development of prion prediction algorithms has facilitated the large-scale identification of PrLDs among “reference” proteomes for various organisms. However, the degree to which intraspecies protein sequence diversity influences predicted aggregation propensity for PrLDs has not been systematically examined. Here, we explore protein sequence variation introduced at genetic, post-transcriptional, and post-translational levels, and its influence on predicted aggregation propensity for human PrLDs. We find that sequence variation is relatively common among PrLDs and in some cases can result in relatively large differences in predicted aggregation propensity. Analysis of a database of sequence variants associated with human disease reveals a number of mutations within PrLDs that are predicted to increase aggregation propensity. Our analyses expand the list of candidate human PrLDs, estimate the effects of sequence variation on the aggregation propensity of PrLDs, and suggest the involvement of prion-like mechanisms in additional human diseases.
DOI: 10.1096/fasebj.2020.34.s1.04007
2020
Identifying the compositional features facilitating recruitment of prion‐like domains to stress granules
Stress granules are highly dynamic nonmembranous cytoplasmic RNA‐protein assemblies that form via interactions between mRNAs stalled in translation initiation and RNA‐binding proteins. Many of the RNA‐binding proteins in stress granules contain prion‐like domains (PrLDs). PrLDs are intrinsically disordered protein domains that compositionally resemble yeast prion domains; some PrLDs have been identified as playing a key role in stress granule formation. PrLDs are thought to be recruited to stress granules in part by liquid‐liquid phase separation (LLPS). Mutations in various RNA binding proteins containing PrLDs are associated with degenerative diseases, including Amyotrophic Lateral Sclerosis. These mutations are associated with the formation of cytoplasmic inclusions that share common components with stress granules, suggesting that the mutations perturb stress granule dynamics. However, the sequence features that drive recruitment of PrLDs into stress granules have yet to be completely defined. We recently demonstrated that many PrLDs are sufficient for stress granule recruitment in yeast, and that this recruitment is driven largely by amino acid composition. Here, we utilize synthetic prion‐like domains to rigorously examine the compositional features driving stress granule recruitment. We show that based solely on amino acid composition, we can rationally design synthetic PrLDs that are recruited to stress‐induced assemblies. Surprisingly, although aromatic amino acids are widely believed to play a key role in recruitment of PrLDs to stress granules, we find that aliphatic amino acids can functionally replace aromatic amino acids in supporting recruitment.
DOI: 10.6084/m9.figshare.12947531
2020
Eukaryota_LCDs_ExcelFormat
LCD-Composer results are stored in a separate file for each organism. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12942362
2020
Viruses_ProteinSequences
Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All virus proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/23/2020-8/24/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.12949271
2020
OrganismKey_tab-delimited
Table that maps the UniProt proteome file names to their corresponding organism names. Mapping is based on the taxonomy file downloaded from UniProt.
DOI: 10.6084/m9.figshare.12923183
2020
Eukaryota_LCDs_tab-delimited
LCD-Composer results are stored in a separate file for each organism. Each file is in a tab-delimited format. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12949208
2020
Archaea_LCDs_ExcelFormat
LCD-Composer results are stored in a separate file for each organism. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12949205
2020
Viruses_LCDs_ExcelFormat
LCD-Composer results are stored in a separate file for each organism. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12923576
2020
Viruses_LCDs_tab-delimited
LCD-Composer results are stored in a separate file for each organism. Each file is in a tab-delimited format. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.11557302
2020
MOESM4 of Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Additional file 4. Comprehensive mapping of PTMs within moderately high-scoring human PrLDs. Human PTMs derived from the ActiveDriverDB were mapped to all human PrLDs with PAPA score &gt; 0.0. For each protein the maximum PAPA score, moderately high-scoring PrLD sequence (corresponding to all overlapping regions with PAPA score &gt; 0.0), amino acid positions bounding the PrLD sequence, and all PTMs mapping to the PrLD region are indicated. PLAAC predictions are also included, as indicated for Additional file 2.
DOI: 10.6084/m9.figshare.12923177
2020
Archaea_LCDs_tab-delimited
LCD-Composer results are stored in a separate file for each organism. Each file is in a tab-delimited format. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12923573
2020
Bacteria_LCDs_tab-delimited
LCD-Composer results are stored in a separate file for each organism. Each file is in a tab-delimited format. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12947546
2020
Bacteria_LCDs_ExcelFormat
LCD-Composer results are stored in a separate file for each organism. Columns are ordered as follows:1) The protein identifier (header in the FASTA proteome file),2) the LCD sequence,3) the location of the LCD within the protein,4) the LCD class (i.e. the amino acid of interest used in the LCD-Composer search criteria),5) the percent composition of the amino acid of interest within the LCD,6) the linear dispersion of the amino acid of interest within the LCD
DOI: 10.6084/m9.figshare.12937703
2020
Eukaryota_ProteinSequences_Part1
Part 1 of 2. Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All eukaryotic proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/21/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.11557281
2020
MOESM1 of Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Additional file 1. PAPA scores derived from random sampling of sequence variant combinations for proteins with high-scoring PrLDs. For all proteins with a moderately high-scoring PrLD (PAPA&gt; 0.0) and at least one single-amino acid variant, the minimum and maximum aggregation propensity scores obtained from randomly sampling PrLD sequence variants, along with the corresponding reference sequence score, are indicated (see Methods for a complete description of sequence variant calculations).
DOI: 10.6084/m9.figshare.12939812
2020
Bacteria_ProteinSequences_Part1
Part 1 of 2. Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All bacterial proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/21/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.11557314
2020
MOESM5 of Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Additional file 5. Statistical analysis of PTM enrichment/depletion within human PrLDs. For each PTM type, statistical enrichment or depletion within PrLDs was evaluated using a two-sided Fisher exact test (see Methods section for detailed description).
DOI: 10.6084/m9.figshare.12937637
2020
Archaea_ProteinSequences
Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All archaeal proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/21/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.11557290
2020
MOESM2 of Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Additional file 2. Aggregation propensity scores and inter-isoform score comparison for all human protein isoforms. Predicted aggregation propensity for all “high-confidence” human protein isoforms (derived from ActiveDriverDB) was calculated using the modified PAPA algorithm (see Methods section for details). Scores and corresponding full protein sequences are indicated for all isoforms, along with the maximum PAPA score among all isoforms mapping to the same gene, the difference between the PAPA score for the indicated isoform and the maximum PAPA score among related isoforms, and the protein sequence corresponding to the highest-scoring related isoform. Additionally, the PLAAC algorithm was used to analyze the same sequences. A binary variable indicates if the protein contains a PLAAC-predicted PrLD that overlaps with the PAPA-predicted PrLD for high-scoring proteins only and, if so, the position of the PLAAC-predicted PrLD
DOI: 10.6084/m9.figshare.12939479
2020
Eukaryota_ProteinSequences_Part2
Part 2 of 2. Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All eukaryotic proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/21/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.12939980
2020
Bacteria_ProteinSequences_Part2
Part 2 of 2. Each file contains the proteome (in FASTA format) corresponding to a single organism. Files ending in "_additional" represent additional isoforms and translation products associated with that organism. All bacterial proteomes were downloaded from the UniProt FTP site (ftp://ftp.uniprot.org/pub/databases/uniprot/) on 8/21/2020 and are included here to ensure that LCDs correctly map to sequence locations. Please see the LICENSE_UniProt file for license information regarding UniProt sequence data.
DOI: 10.6084/m9.figshare.11557296
2020
MOESM3 of Natural and pathogenic protein sequence variation affecting prion-like domains within and across human proteomes
Additional file 3. Comparison of wild-type and disease-associated mutant PAPA scores. For all disease associated mutants in the ClinVar database, mutant sequences were generated by incorporating the indicated amino acid substitution at the appropriate position and re-scored using the modified PAPA algorithm. For each variant, both wild-type and mutant aggregation propensity scores are indicated, as well as the difference between mutant and wild-type scores. For each variant, the associated disease phenotype annotation is also included. PLAAC predictions are also included, as indicated for Additional file 2
2004
PRION GENETICS: Ne wR ules for a New Kind of Gene 1
2002
Construction and studies of recombinant DNA delivery system for CAV