Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
1.
Cell ; 184(9): 2441-2453.e18, 2021 04 29.
Article in English | MEDLINE | ID: mdl-33770501

ABSTRACT

Tn7-like transposons have co-opted CRISPR systems, including class 1 type I-F, I-B, and class 2 type V-K. Intriguingly, although these CRISPR-associated transposases (CASTs) undergo robust CRISPR RNA (crRNA)-guided transposition, they are almost never found in sites targeted by the crRNAs encoded by the cognate CRISPR array. To understand this paradox, we investigated CAST V-K and I-B systems and found two distinct modes of transposition: (1) crRNA-guided transposition and (2) CRISPR array-independent homing. We show distinct CAST systems utilize different molecular mechanisms to target their homing site. Type V-K CAST systems use a short, delocalized crRNA for RNA-guided homing, whereas type I-B CAST systems, which contain two distinct target selector proteins, use TniQ for RNA-guided DNA transposition and TnsD for homing to an attachment site. These observations illuminate a key step in the life cycle of CAST systems and highlight the diversity of molecular mechanisms mediating transposon homing.


Subject(s)
Bacteria/genetics , Bacterial Proteins/metabolism , CRISPR-Associated Proteins/metabolism , DNA Transposable Elements/physiology , DNA, Bacterial/metabolism , RNA, Guide, Kinetoplastida , Transposases/metabolism , Bacteria/metabolism , Bacterial Proteins/genetics , CRISPR-Associated Proteins/genetics , CRISPR-Cas Systems , Clustered Regularly Interspaced Short Palindromic Repeats , DNA, Bacterial/genetics , Gene Editing , Recombination, Genetic , Transposases/genetics
2.
Mol Cell ; 83(12): 2122-2136.e10, 2023 Jun 15.
Article in English | MEDLINE | ID: mdl-37267947

ABSTRACT

To spread, transposons must integrate into target sites without disruption of essential genes while avoiding host defense systems. Tn7-like transposons employ multiple mechanisms for target-site selection, including protein-guided targeting and, in CRISPR-associated transposons (CASTs), RNA-guided targeting. Combining phylogenomic and structural analyses, we conducted a broad survey of target selectors, revealing diverse mechanisms used by Tn7 to recognize target sites, including previously uncharacterized target-selector proteins found in newly discovered transposable elements (TEs). We experimentally characterized a CAST I-D system and a Tn6022-like transposon that uses TnsF, which contains an inactivated tyrosine recombinase domain, to target the comM gene. Additionally, we identified a non-Tn7 transposon, Tsy, encoding a homolog of TnsF with an active tyrosine recombinase domain, which we show also inserts into comM. Our findings show that Tn7 transposons employ modular architecture and co-opt target selectors from various sources to optimize target selection and drive transposon spread.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , DNA Transposable Elements , Plasmids , DNA Transposable Elements/genetics , Recombinases/genetics , Tyrosine/genetics
3.
Nature ; 620(7974): 660-668, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37380027

ABSTRACT

RNA-guided systems, which use complementarity between a guide RNA and target nucleic acid sequences for recognition of genetic elements, have a central role in biological processes in both prokaryotes and eukaryotes. For example, the prokaryotic CRISPR-Cas systems provide adaptive immunity for bacteria and archaea against foreign genetic elements. Cas effectors such as Cas9 and Cas12 perform guide-RNA-dependent DNA cleavage1. Although a few eukaryotic RNA-guided systems have been studied, including RNA interference2 and ribosomal RNA modification3, it remains unclear whether eukaryotes have RNA-guided endonucleases. Recently, a new class of prokaryotic RNA-guided systems (termed OMEGA) was reported4,5. The OMEGA effector TnpB is the putative ancestor of Cas12 and has RNA-guided endonuclease activity4,6. TnpB may also be the ancestor of the eukaryotic transposon-encoded Fanzor (Fz) proteins4,7, raising the possibility that eukaryotes are also equipped with CRISPR-Cas or OMEGA-like programmable RNA-guided endonucleases. Here we report the biochemical characterization of Fz, showing that it is an RNA-guided DNA endonuclease. We also show that Fz can be reprogrammed for human genome engineering applications. Finally, we resolve the structure of Spizellomyces punctatus Fz at 2.7 Å using cryogenic electron microscopy, showing the conservation of core regions among Fz, TnpB and Cas12, despite diverse cognate RNA structures. Our results show that Fz is a eukaryotic OMEGA system, demonstrating that RNA-guided endonucleases are present in all three domains of life.


Subject(s)
Chytridiomycota , Endonucleases , Eukaryota , Fungal Proteins , Gene Editing , RNA , Humans , Archaea/genetics , Archaea/immunology , Bacteria/genetics , Bacteria/immunology , CRISPR-Associated Protein 9/metabolism , CRISPR-Associated Proteins/chemistry , CRISPR-Associated Proteins/metabolism , CRISPR-Associated Proteins/ultrastructure , CRISPR-Cas Systems , DNA Transposable Elements/genetics , Endonucleases/chemistry , Endonucleases/metabolism , Endonucleases/ultrastructure , Eukaryota/enzymology , Gene Editing/methods , RNA/genetics , RNA/metabolism , RNA, Guide, CRISPR-Cas Systems/genetics , RNA, Guide, CRISPR-Cas Systems/metabolism , Cryoelectron Microscopy , Fungal Proteins/chemistry , Fungal Proteins/metabolism , Fungal Proteins/ultrastructure , Evolution, Molecular , Conserved Sequence , Chytridiomycota/enzymology
4.
Nature ; 610(7932): 575-581, 2022 10.
Article in English | MEDLINE | ID: mdl-36224386

ABSTRACT

RNA-guided systems, such as CRISPR-Cas, combine programmable substrate recognition with enzymatic function, a combination that has been used advantageously to develop powerful molecular technologies1,2. Structural studies of these systems have illuminated how the RNA and protein jointly recognize and cleave their substrates, guiding rational engineering for further technology development3. Recent work identified a new class of RNA-guided systems, termed OMEGA, which include IscB, the likely ancestor of Cas9, and the nickase IsrB, a homologue of IscB lacking the HNH nuclease domain4. IsrB consists of only around 350 amino acids, but its small size is counterbalanced by a relatively large RNA guide (roughly 300-nt ωRNA). Here, we report the cryogenic-electron microscopy structure of Desulfovirgula thermocuniculi IsrB (DtIsrB) in complex with its cognate ωRNA and a target DNA. We find the overall structure of the IsrB protein shares a common scaffold with Cas9. In contrast to Cas9, however, which uses a recognition (REC) lobe to facilitate target selection, IsrB relies on its ωRNA, part of which forms an intricate ternary structure positioned analogously to REC. Structural analyses of IsrB and its ωRNA as well as comparisons to other RNA-guided systems highlight the functional interplay between protein and RNA, advancing our understanding of the biology and evolution of these diverse systems.


Subject(s)
DNA , Deoxyribonuclease I , RNA, Guide, Kinetoplastida , CRISPR-Cas Systems , Deoxyribonuclease I/chemistry , Deoxyribonuclease I/metabolism , Deoxyribonuclease I/ultrastructure , DNA/chemistry , DNA/metabolism , DNA/ultrastructure , RNA, Guide, Kinetoplastida/chemistry , RNA, Guide, Kinetoplastida/metabolism , RNA, Guide, Kinetoplastida/ultrastructure , Cryoelectron Microscopy , CRISPR-Associated Proteins/chemistry
5.
Proc Natl Acad Sci U S A ; 121(11): e2307812120, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38437549

ABSTRACT

A number of endogenous genes in the human genome encode retroviral gag-like proteins, which were domesticated from ancient retroelements. The paraneoplastic Ma antigen (PNMA) family members encode a gag-like capsid domain, but their ability to assemble as capsids and traffic between cells remains mostly uncharacterized. Here, we systematically investigate human PNMA proteins and find that a number of PNMAs are secreted by human cells. We determine that PNMA2 forms icosahedral capsids efficiently but does not naturally encapsidate nucleic acids. We resolve the cryoelectron microscopy (cryo-EM) structure of PNMA2 and leverage the structure to design engineered PNMA2 (ePNMA2) particles with RNA packaging abilities. Recombinantly purified ePNMA2 proteins package mRNA molecules into icosahedral capsids and can function as delivery vehicles in mammalian cell lines, demonstrating the potential for engineered endogenous capsids as a nucleic acid therapy delivery modality.


Subject(s)
Antigens, Neoplasm , Capsid , Nerve Tissue Proteins , Animals , Humans , RNA, Messenger/genetics , Cryoelectron Microscopy , Mammals
6.
Proc Natl Acad Sci U S A ; 119(23): e2121335119, 2022 06 07.
Article in English | MEDLINE | ID: mdl-35639694

ABSTRACT

Many pathogenic viruses are endemic among human populations and can cause a broad variety of diseases, some potentially leading to devastating pandemics. How virus populations maintain diversity and what selective pressures drive population turnover is not thoroughly understood. We conducted a large-scale phylodynamic analysis of 27 human pathogenic RNA viruses spanning diverse life history traits, in search of unifying trends that shape virus evolution. For most virus species, we identify multiple, cocirculating lineages with low turnover rates. These lineages appear to be largely noncompeting and likely occupy semiindependent epidemiological niches that are not regionally or seasonally defined. Typically, intralineage mutational signatures are similar to interlineage signatures. The principal exception are members of the family Picornaviridae, for which mutations in capsid protein genes are primarily lineage defining. Interlineage turnover is slower than expected under a neutral model, whereas intralineage turnover is faster than the neutral expectation, further supporting the existence of independent niches. The persistence of virus lineages appears to stem from limited outbreaks within small communities, so that only a small fraction of the global susceptible population is infected at any time. As disparate communities become increasingly connected through globalization, interaction and competition between lineages might increase as well, which could result in changing selective pressures and increased diversification and/or pathogenicity. Thus, in addition to zoonotic events, ongoing surveillance of familiar, endemic viruses appears to merit global attention with respect to the prevention or mitigation of future pandemics.


Subject(s)
RNA Viruses , RNA , Virus Diseases , Disease Outbreaks/prevention & control , Global Health , Humans , Internationality , Pandemics , RNA Viruses/genetics , RNA Viruses/pathogenicity , Seasons , Virus Diseases/epidemiology , Virus Diseases/genetics
7.
Proc Natl Acad Sci U S A ; 118(29)2021 07 20.
Article in English | MEDLINE | ID: mdl-34292871

ABSTRACT

Understanding the trends in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution is paramount to control the COVID-19 pandemic. We analyzed more than 300,000 high-quality genome sequences of SARS-CoV-2 variants available as of January 2021. The results show that the ongoing evolution of SARS-CoV-2 during the pandemic is characterized primarily by purifying selection, but a small set of sites appear to evolve under positive selection. The receptor-binding domain of the spike protein and the region of the nucleocapsid protein associated with nuclear localization signals (NLS) are enriched with positively selected amino acid replacements. These replacements form a strongly connected network of apparent epistatic interactions and are signatures of major partitions in the SARS-CoV-2 phylogeny. Virus diversity within each geographic region has been steadily growing for the entirety of the pandemic, but analysis of the phylogenetic distances between pairs of regions reveals four distinct periods based on global partitioning of the tree and the emergence of key mutations. The initial period of rapid diversification into region-specific phylogenies that ended in February 2020 was followed by a major extinction event and global homogenization concomitant with the spread of D614G in the spike protein, ending in March 2020. The NLS-associated variants across multiple partitions rose to global prominence in March to July, during a period of stasis in terms of interregional diversity. Finally, beginning in July 2020, multiple mutations, some of which have since been demonstrated to enable antibody evasion, began to emerge associated with ongoing regional diversification, which might be indicative of speciation.


Subject(s)
Adaptation, Physiological/genetics , Evolution, Molecular , SARS-CoV-2/genetics , Amino Acid Substitution , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , COVID-19 Testing , Coronavirus Nucleocapsid Proteins/genetics , Epistasis, Genetic , Genome, Viral/genetics , Humans , Immune Evasion/genetics , Mutation , Nuclear Localization Signals/genetics , Phosphoproteins/genetics , Phylogeny , Protein Interaction Domains and Motifs/genetics , SARS-CoV-2/classification , Selection, Genetic , Spike Glycoprotein, Coronavirus/genetics , Vaccination
8.
Proc Natl Acad Sci U S A ; 117(26): 15193-15199, 2020 06 30.
Article in English | MEDLINE | ID: mdl-32522874

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses an immediate, major threat to public health across the globe. Here we report an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identify key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions.


Subject(s)
Betacoronavirus/genetics , Evolution, Molecular , Genome, Viral , Nucleocapsid Proteins/genetics , Spike Glycoprotein, Coronavirus/genetics , Animals , Betacoronavirus/classification , Betacoronavirus/pathogenicity , Host Specificity , Humans , Machine Learning , Middle East Respiratory Syndrome Coronavirus/classification , Middle East Respiratory Syndrome Coronavirus/genetics , Middle East Respiratory Syndrome Coronavirus/pathogenicity , Mutagenesis, Insertional , Nuclear Localization Signals/genetics , Nucleocapsid Proteins/chemistry , Phylogeny , SARS-CoV-2 , Sequence Homology , Spike Glycoprotein, Coronavirus/chemistry , Virulence/genetics
9.
RNA Biol ; 16(4): 435-448, 2019 04.
Article in English | MEDLINE | ID: mdl-30103650

ABSTRACT

Trans-activating CRISPR (tracr) RNA is a distinct RNA species that interacts with the CRISPR (cr) RNA to form the dual guide (g) RNA in type II and subtype V-B CRISPR-Cas systems. The tracrRNA-crRNA interaction is essential for pre-crRNA processing as well as target recognition and cleavage. The tracrRNA consists of an antirepeat, which forms an imperfect hybrid with the repeat in the crRNA, and a distal region containing a Rho-independent terminator. Exhaustive comparative analysis of the sequences and predicted structures of the Class 2 CRISPR guide RNAs shows that all these guide RNAs share distinct structural features, in particular, the nexus stem-loop that separates the repeat-antirepeat hybrid from the distal portion of the tracrRNA and the conserved GU pair at that end of the hybrid. These structural constraints might ensure full exposure of the spacer for target recognition. Reconstruction of tracrRNA evolution for 4 tight bacterial groups demonstrates random drift of repeat-antirepeat complementarity within a window of hybrid stability that is, apparently, maintained by selection. An evolutionary scenario is proposed whereby tracrRNAs evolved on multiple occasions, via rearrangement of a CRISPR array to form the antirepeat in different locations with respect to the array. A functional tracrRNA would form if, in the new location, the antirepeat is flanked by sequences that meet the minimal requirements for a promoter and a Rho-independent terminator. Alternatively, or additionally, the antirepeat sequence could be occasionally 'reset' by recombination with a repeat, restoring the functionality of tracrRNAs that drift beyond the required minimal hybrid stability.


Subject(s)
CRISPR-Cas Systems/genetics , Evolution, Molecular , Genomics , RNA, Bacterial/genetics , Trans-Activators/genetics , Bacteroides/genetics , Base Sequence , Conserved Sequence/genetics , Nucleic Acid Conformation , RNA, Guide, Kinetoplastida/genetics , Repetitive Sequences, Nucleic Acid/genetics , Streptococcus/genetics , Thermodynamics
11.
Nucleic Acids Res ; 44(22): 10898-10911, 2016 12 15.
Article in English | MEDLINE | ID: mdl-27466388

ABSTRACT

Specific structures in mRNA modulate translation rate and thus can affect protein folding. Using the protein structures from two eukaryotes and three prokaryotes, we explore the connections between the protein compactness, inferred from solvent accessibility, and mRNA structure, inferred from mRNA folding energy (ΔG). In both prokaryotes and eukaryotes, the ΔG value of the most stable 30 nucleotide segment of the mRNA (ΔGmin) strongly, positively correlates with protein solvent accessibility. Thus, mRNAs containing exceptionally stable secondary structure elements typically encode compact proteins. The correlations between ΔG and protein compactness are much more pronounced in predicted ordered parts of proteins compared to the predicted disordered parts, indicative of an important role of mRNA secondary structure elements in the control of protein folding. Additionally, ΔG correlates with the mRNA length and the evolutionary rate of synonymous positions. The correlations are partially independent and were used to construct multiple regression models which explain about half of the variance of protein solvent accessibility. These findings suggest a model in which the mRNA structure, particularly exceptionally stable RNA structural elements, act as gauges of protein co-translational folding by reducing ribosome speed when the nascent peptide needs time to form and optimize the core structure.


Subject(s)
Protein Folding , RNA, Messenger/physiology , Animals , Base Composition , Humans , Kinetics , Linear Models , Models, Molecular , Nucleic Acid Conformation , Protein Biosynthesis , Protein Structure, Secondary , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , RNA Stability , RNA, Messenger/chemistry , Thermodynamics , Transcriptome
12.
RNA Biol ; 14(12): 1649-1654, 2017 12 02.
Article in English | MEDLINE | ID: mdl-28722509

ABSTRACT

Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.


Subject(s)
Adaptation, Biological , Nucleic Acid Conformation , Protein Biosynthesis , Protein Folding , RNA, Messenger/chemistry , RNA, Messenger/genetics , Animals , Biological Evolution , Humans , RNA Stability , Selection, Genetic , Structure-Activity Relationship
13.
Phys Biol ; 12(3): 035001, 2015 Apr 30.
Article in English | MEDLINE | ID: mdl-25927823

ABSTRACT

Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.


Subject(s)
Archaea/genetics , Bacteria/genetics , Eukaryota/genetics , Evolution, Molecular , Protein Stability , Animals , Archaea/metabolism , Bacteria/metabolism , Eukaryota/metabolism , Humans , Mutation , Proteins/chemistry
14.
Proteins ; 82(6): 897-903, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24130156

ABSTRACT

Several studies have recently shown that germline mutations in RTEL1, an essential DNA helicase involved in telomere regulation and DNA repair, cause Hoyeraal-Hreidarsson syndrome (HHS), a severe form of dyskeratosis congenita. Using original new softwares, facilitating the delineation of the different domains of the protein and the identification of remote relationships for orphan domains, we outline here that the C-terminal extension of RTEL1, downstream of its catalytic domain and including several HHS-associated mutations, contains a yet unidentified tandem of harmonin-N-like domains, which may serve as a hub for partner interaction. This finding highlights the potential critical role of this region for the function of RTEL1 and gives insights into the impact that the identified mutations would have on the structure and function of these domains.


Subject(s)
DNA Helicases/chemistry , Dyskeratosis Congenita/genetics , Fetal Growth Retardation/genetics , Intellectual Disability/genetics , Microcephaly/genetics , Amino Acid Sequence , Conserved Sequence , DNA Helicases/genetics , Dyskeratosis Congenita/enzymology , Fetal Growth Retardation/enzymology , Gene Duplication , Germ-Line Mutation , Humans , Hydrophobic and Hydrophilic Interactions , Intellectual Disability/enzymology , Microcephaly/enzymology , Models, Molecular , Molecular Sequence Data , Protein Structure, Tertiary
15.
Bioinformatics ; 29(14): 1726-33, 2013 Jul 15.
Article in English | MEDLINE | ID: mdl-23677940

ABSTRACT

MOTIVATION: Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze. RESULTS: We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan. AVAILABILITY: We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html. CONTACT: isabelle.callebaut@impmc.upmc.fr SUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online.


Subject(s)
Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, Protein/methods , Amino Acid Sequence , Cluster Analysis , Humans , Hydrophobic and Hydrophilic Interactions , Molecular Sequence Data , Proteins/chemistry , Saccharomyces cerevisiae Proteins/chemistry , Transcription Factor TFIID/chemistry
16.
Bioinformatics ; 29(14): 1742-9, 2013 Jul 15.
Article in English | MEDLINE | ID: mdl-23652426

ABSTRACT

MOTIVATION: Structural prediction of protein interactions currently remains a challenging but fundamental goal. In particular, progress in scoring functions is critical for the efficient discrimination of near-native interfaces among large sets of decoys. Many functions have been developed using knowledge-based potentials, but few make use of multi-body interactions or evolutionary information, although multi-residue interactions are crucial for protein-protein binding and protein interfaces undergo significant selection pressure to maintain their interactions. RESULTS: This article presents InterEvScore, a novel scoring function using a coarse-grained statistical potential including two- and three-body interactions, which provides each residue with the opportunity to contribute in its most favorable local structural environment. Combination of this potential with evolutionary information considerably improves scoring results on the 54 test cases from the widely used protein docking benchmark for which evolutionary information can be collected. We analyze how our way to include evolutionary information gradually increases the discriminative power of InterEvScore. Comparison with several previously published scoring functions (ZDOCK, ZRANK and SPIDER) shows the significant progress brought by InterEvScore. AVAILABILITY: http://biodev.cea.fr/interevol/interevscore CONTACT: guerois@cea.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Molecular Docking Simulation/methods , Protein Interaction Mapping/methods , Software , Data Interpretation, Statistical , Evolution, Molecular , Knowledge Bases
17.
PLoS Comput Biol ; 9(10): e1003280, 2013 Oct.
Article in English | MEDLINE | ID: mdl-24204229

ABSTRACT

In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly correspond to regular secondary structures, which together form structured or foldable regions. Genome-wide analyses revealed that SEG-HCA is opposite of disorder predictors, both addressing distinct structural states. Interestingly, there is however an overlap between the two predictions, including small segments of disordered sequences, which undergo coupled folding and binding. SEG-HCA thus gives access to these specific domains, which are generally poorly represented in domain databases. Comparison of the whole set of SEG-HCA predictions with the Conserved Domain Database (CDD) also highlighted a wide proportion of predicted large (length >50 amino acids) segments, which are CDD orphan. These orphan sequences may either correspond to highly divergent members of already known families or belong to new families of domains. Their comprehensive description thus opens new avenues to investigate new functional and/or structural features, which remained so far uncovered. Altogether, the data described here provide new insights into the protein architecture and organization throughout the three kingdoms of life.


Subject(s)
Genomics/methods , Protein Folding , Protein Structure, Tertiary , Proteins , Amino Acid Sequence , Cluster Analysis , Fungal Proteins , Genome , Humans , Hydrophobic and Hydrophilic Interactions , Models, Molecular , Models, Statistical , Molecular Sequence Data , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Proteome , Protozoan Proteins , Sequence Analysis, DNA , Sequence Analysis, Protein
18.
Nucleic Acids Res ; 40(Database issue): D847-56, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22053089

ABSTRACT

Capturing how the structures of interacting partners evolved at their binding interfaces is a fundamental issue for understanding interactomes evolution. In that scope, the InterEvol database was designed for exploring 3D structures of homologous interfaces of protein complexes. For every chain forming a complex in the protein data bank (PDB), close and remote structural interologs were identified providing essential snapshots for studying interfaces evolution. The database provides tools to retrieve and visualize these structures. In addition, pre-computed multiple sequence alignments of most likely interologs retrieved from a wide range of species can be downloaded to enrich the analysis. The database can be queried either directly by pdb code or keyword but also from the sequence of one or two partners. Interologs multiple sequence alignments can also be recomputed online with tailored parameters using the InterEvolAlign facility. Last, an InterEvol PyMol plugin was developed to improve interactive exploration of structures versus sequence alignments at the interfaces of complexes. Based on a series of automatic methods to extract structural and sequence data, the database will be monthly updated. Structures coordinates and sequence alignments can be queried and downloaded from the InterEvol web interface at http://biodev.cea.fr/interevol/.


Subject(s)
Databases, Protein , Evolution, Molecular , Multiprotein Complexes/chemistry , Protein Conformation , Protein Interaction Mapping , Sequence Alignment , Sequence Analysis, Protein
19.
Proc Natl Acad Sci U S A ; 108(31): 12663-8, 2011 Aug 02.
Article in English | MEDLINE | ID: mdl-21768349

ABSTRACT

Cernunnos/XLF is a core protein of the nonhomologous DNA end-joining (NHEJ) pathway that processes the majority of DNA double-strand breaks in mammals. Cernunnos stimulates the final ligation step catalyzed by the complex between DNA ligase IV and Xrcc4 (X4). Here we present the crystal structure of the X4(1-157)-Cernunnos(1-224) complex at 5.5-Å resolution and identify the relative positions of the two factors and their binding sites. The X-ray structure reveals a filament arrangement for X4(1-157) and Cernunnos(1-224) homodimers mediated by repeated interactions through their N-terminal head domains. A filament arrangement of the X4-Cernunnos complex was confirmed by transmission electron microscopy analyses both with truncated and full-length proteins. We further modeled the interface and used structure-based site-directed mutagenesis and calorimetry to characterize the roles of various residues at the X4-Cernunnos interface. We identified four X4 residues (Glu(55), Asp(58), Met(61), and Phe(106)) essential for the interaction with Cernunnos. These findings provide new insights into the molecular bases for stimulatory and bridging roles of Cernunnos in the final DNA ligation step.


Subject(s)
DNA Repair Enzymes/metabolism , DNA Repair , DNA-Binding Proteins/metabolism , Amino Acid Sequence , Aspartic Acid/chemistry , Aspartic Acid/genetics , Aspartic Acid/metabolism , Binding Sites/genetics , Blotting, Western , Calorimetry , Crystallography, X-Ray , DNA Repair Enzymes/chemistry , DNA Repair Enzymes/genetics , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Glutamic Acid/chemistry , Glutamic Acid/genetics , Glutamic Acid/metabolism , Humans , Methionine/chemistry , Methionine/genetics , Methionine/metabolism , Microscopy, Electron, Transmission , Models, Molecular , Molecular Sequence Data , Mutation , Peptide Fragments/chemistry , Peptide Fragments/metabolism , Peptide Fragments/ultrastructure , Phenylalanine/chemistry , Phenylalanine/genetics , Phenylalanine/metabolism , Protein Binding , Protein Multimerization , Protein Structure, Tertiary , Sequence Homology, Amino Acid
20.
Mob DNA ; 15(1): 12, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38863000

ABSTRACT

Eukaryotic retroelements are generally divided into two classes: long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons. A third class of eukaryotic retroelement, the Penelope-like elements (PLEs), has been well-characterized bioinformatically, but relatively little is known about the transposition mechanism of these elements. PLEs share some features with the R2 retrotransposon from Bombyx mori, which uses a target-primed reverse transcription (TPRT) mechanism, but their distinct phylogeny suggests PLEs may utilize a novel mechanism of mobilization. Using protein purified from E. coli, we report unique in vitro properties of a PLE from the green anole (Anolis carolinensis), revealing mechanistic aspects not shared by other retrotransposons. We found that reverse transcription is initiated at two adjacent sites within the transposon RNA that is not homologous to the cleaved DNA, a feature that is reflected in the genomic "tail" signature shared between and unique to PLEs. Our results for the first active PLE in vitro provide a starting point for understanding PLE mobilization and biology.

SELECTION OF CITATIONS
SEARCH DETAIL