Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Am Chem Soc ; 144(31): 14057-14070, 2022 08 10.
Article in English | MEDLINE | ID: mdl-35895935

ABSTRACT

Dehydroamino acids are important structural motifs and biosynthetic intermediates for natural products. Many bioactive natural products of nonribosomal origin contain dehydroamino acids; however, the biosynthesis of dehydroamino acids in most nonribosomal peptides is not well understood. Here, we provide biochemical and bioinformatic evidence in support of the role of a unique class of condensation domains in dehydration (CmodAA). We also obtain the crystal structure of a CmodAA domain, which is part of the nonribosomal peptide synthetase AmbE in the biosynthesis of the antibiotic methoxyvinylglycine. Biochemical analysis reveals that AmbE-CmodAA modifies a peptide substrate that is attached to the donor carrier protein. Mutational studies of AmbE-CmodAA identify several key residues for activity, including four residues that are mostly conserved in the CmodAA subfamily. Alanine mutation of these conserved residues either significantly increases or decreases AmbE activity. AmbE exhibits a dimeric conformation, which is uncommon and could enable transfer of an intermediate between different protomers. Our discovery highlights a central dehydrating function for CmodAA domains that unifies dehydroamino acid biosynthesis in diverse nonribosomal peptide pathways. Our work also begins to shed light on the mechanism of CmodAA domains. Understanding CmodAA domain function may facilitate identification of new natural products that contain dehydroamino acids and enable engineering of dehydroamino acids into nonribosomal peptides.


Subject(s)
Biological Products , Peptide Biosynthesis, Nucleic Acid-Independent , Anti-Bacterial Agents , Peptide Synthases/metabolism , Peptides/chemistry
2.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Article in English | MEDLINE | ID: mdl-35022216

ABSTRACT

The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis. We show that these models are notably superior to conservation profiles in estimating the already observable SARS-CoV-2 variability. In the receptor binding domain of the spike protein, we observe that the predicted mutability correlates well with experimental measures of protein stability and that both are reliable mutability predictors (receiver operating characteristic areas under the curve ∼0.8). Most interestingly, we observe an increasing agreement between our model and the observed variability as more data become available over time, proving the anticipatory capacity of our model. When combined with data concerning the immune response, our approach identifies positions where current variants of concern are highly overrepresented. These results could assist studies on viral evolution and future viral outbreaks and, in particular, guide the exploration and anticipation of potentially harmful future SARS-CoV-2 variants.


Subject(s)
COVID-19/virology , Epistasis, Genetic , Epitopes , Mutation , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Viral Proteins/chemistry , Algorithms , Area Under Curve , Computational Biology/methods , DNA Mutational Analysis , Databases, Protein , Deep Learning , Epitopes/chemistry , Genome, Viral , Humans , Models, Statistical , Mutagenesis , Probability , Protein Domains , ROC Curve
3.
Mol Biol Evol ; 39(1)2022 01 07.
Article in English | MEDLINE | ID: mdl-34751386

ABSTRACT

During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.


Subject(s)
Epistasis, Genetic , Space Flight , Evolution, Molecular , Genetic Fitness , Models, Genetic , Mutation , Proteins/genetics
4.
Nucleic Acids Res ; 46(D1): D213-D217, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29069475

ABSTRACT

The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the 'principal' isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants.


Subject(s)
Databases, Genetic , Protein Isoforms/genetics , Alternative Splicing , Amino Acid Sequence , Animals , Humans , Models, Molecular , Molecular Sequence Annotation , Protein Conformation , Protein Isoforms/chemistry , Proteome/genetics , Reproducibility of Results , Sequence Alignment
5.
Proc Natl Acad Sci U S A ; 113(52): 15018-15023, 2016 12 27.
Article in English | MEDLINE | ID: mdl-27965389

ABSTRACT

Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.


Subject(s)
Computational Biology/methods , Eukaryota/chemistry , Evolution, Molecular , Mutation , Proteins/chemistry , Biological Evolution , Catalytic Domain , Databases, Protein , Humans , Models, Statistical , Prokaryotic Cells/chemistry , Protein Binding , Protein Interaction Mapping , Protein Multimerization , Reproducibility of Results , Sequence Alignment , Sequence Homology
6.
PLoS Comput Biol ; 11(6): e1004325, 2015 Jun.
Article in English | MEDLINE | ID: mdl-26061177

ABSTRACT

Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved--all the homologous exons we identified evolved over 460 million years ago--and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles.


Subject(s)
Alternative Splicing/genetics , Exons/genetics , Protein Isoforms/genetics , Amino Acid Sequence , Animals , Computational Biology , Databases, Genetic , Humans , Mice , Models, Molecular , Molecular Sequence Data , Organ Specificity/genetics , Peptides/chemistry , Peptides/genetics , Peptides/metabolism , Protein Conformation , Protein Isoforms/chemistry , Protein Isoforms/metabolism , Sequence Alignment , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...