Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Mol Syst Biol ; 17(9): e10079, 2021 09.
Article in English | MEDLINE | ID: mdl-34519429

ABSTRACT

We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.


Subject(s)
Angiotensin-Converting Enzyme 2/metabolism , Host-Pathogen Interactions/genetics , Protein Processing, Post-Translational , SARS-CoV-2/metabolism , Spike Glycoprotein, Coronavirus/metabolism , Amino Acid Transport Systems, Neutral/chemistry , Amino Acid Transport Systems, Neutral/genetics , Amino Acid Transport Systems, Neutral/metabolism , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , Binding Sites , COVID-19/genetics , COVID-19/metabolism , COVID-19/virology , Computational Biology/methods , Coronavirus Envelope Proteins/chemistry , Coronavirus Envelope Proteins/genetics , Coronavirus Envelope Proteins/metabolism , Coronavirus Nucleocapsid Proteins/chemistry , Coronavirus Nucleocapsid Proteins/genetics , Coronavirus Nucleocapsid Proteins/metabolism , Humans , Mitochondrial Membrane Transport Proteins/chemistry , Mitochondrial Membrane Transport Proteins/genetics , Mitochondrial Membrane Transport Proteins/metabolism , Mitochondrial Precursor Protein Import Complex Proteins , Models, Molecular , Molecular Mimicry , Neuropilin-1/chemistry , Neuropilin-1/genetics , Neuropilin-1/metabolism , Phosphoproteins/chemistry , Phosphoproteins/genetics , Phosphoproteins/metabolism , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Protein Multimerization , SARS-CoV-2/chemistry , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Viral Matrix Proteins/chemistry , Viral Matrix Proteins/genetics , Viral Matrix Proteins/metabolism , Viroporin Proteins/chemistry , Viroporin Proteins/genetics , Viroporin Proteins/metabolism , Virus Replication
2.
Nucleic Acids Res ; 49(W1): W535-W540, 2021 07 02.
Article in English | MEDLINE | ID: mdl-33999203

ABSTRACT

Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.


Subject(s)
Protein Conformation , Software , Binding Sites , Coronavirus Nucleocapsid Proteins/chemistry , DNA-Binding Proteins/chemistry , Phosphoproteins/chemistry , Protein Structure, Secondary , Proteins/chemistry , Proteins/physiology , RNA-Binding Proteins/chemistry , Sequence Alignment , Sequence Analysis, Protein
3.
Proteomics ; 18(21-22): e1800227, 2018 11.
Article in English | MEDLINE | ID: mdl-30318701

ABSTRACT

Despite substantial and successful projects for structural genomics, many proteins remain for which neither experimental structures nor homology-based models are known for any part of the amino acid sequence. These have been called "dark proteins," in contrast to non-dark proteins, in which at least part of the sequence has a known or inferred structure. It has been hypothesized that non-dark proteins may be more abundantly expressed than dark proteins, which are known to have much fewer sequence relatives. Surprisingly, the opposite has been observed: human dark and non-dark proteins had quite similar levels of expression, in terms of both mRNA and protein abundance. Such high levels of expression strongly indicate that dark proteins-as a group-are important for cellular function. This is remarkable, given how carefully structural biologists have focused on proteins crucial for function, and highlights the important challenge posed by dark proteins in future research.


Subject(s)
Databases, Protein , Proteome/analysis , Computational Biology , Protein Conformation
4.
PLoS One ; 12(9): e0184119, 2017.
Article in English | MEDLINE | ID: mdl-28902868

ABSTRACT

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.


Subject(s)
DNA, Intergenic/genetics , Escherichia coli O157/genetics , Genes, Bacterial , Genome, Bacterial , Conserved Sequence , DNA, Bacterial/genetics , Genetic Association Studies , High-Throughput Nucleotide Sequencing , Open Reading Frames/genetics , RNA, Bacterial/genetics , Transcriptome
5.
BMC Genomics ; 17: 133, 2016 Feb 24.
Article in English | MEDLINE | ID: mdl-26911138

ABSTRACT

BACKGROUND: Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). RESULTS: Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. CONCLUSIONS: These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.


Subject(s)
Escherichia coli O157/genetics , Evolution, Molecular , Genes, Bacterial , Proteome/genetics , Transcriptome , Animals , Cattle , Computational Biology , Escherichia coli Proteins/genetics , Mass Spectrometry , Phenotype , RNA, Bacterial/genetics , Sequence Analysis, RNA
6.
Proc Natl Acad Sci U S A ; 112(52): 15898-903, 2015 Dec 29.
Article in English | MEDLINE | ID: mdl-26578815

ABSTRACT

We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.


Subject(s)
Computational Biology/methods , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Algorithms , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Eukaryota/metabolism , Humans , Models, Molecular , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteome/chemistry , Proteome/genetics , Viruses/genetics , Viruses/metabolism
7.
BMC Bioinformatics ; 16 Suppl 11: S7, 2015.
Article in English | MEDLINE | ID: mdl-26329268

ABSTRACT

BACKGROUND: To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS: To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS: The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.


Subject(s)
Amyloid beta-Protein Precursor/chemistry , Computational Biology/methods , Computer Graphics , Sequence Analysis, Protein/methods , Software , src-Family Kinases/chemistry , Amyloid beta-Protein Precursor/metabolism , B-Lymphocytes/metabolism , Databases, Protein , Humans , Protein Conformation , Protein Processing, Post-Translational , src-Family Kinases/metabolism
9.
Structure ; 22(7): 938-9, 2014 Jul 08.
Article in English | MEDLINE | ID: mdl-25007224

ABSTRACT

Structure comparisons are now the first step when a new experimental high-resolution protein structure has been determined. In this issue of Structure, Wiederstein and colleagues describe their latest tool for comparing structures, which gives us the unprecedented power to discover crucial structural connections between whole complexes of proteins in the full structural database in real time.


Subject(s)
Computational Biology/methods , Information Storage and Retrieval/methods , Multiprotein Complexes/chemistry , Protein Structure, Quaternary
10.
Nucleic Acids Res ; 42(Web Server issue): W337-43, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24799431

ABSTRACT

PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein-protein binding sites (ISIS2), protein-polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org.


Subject(s)
Protein Conformation , Software , Amino Acid Substitution , Binding Sites , Gene Ontology , Internet , Intrinsically Disordered Proteins/chemistry , Membrane Proteins/chemistry , Mutation , Protein Interaction Mapping , Proteins/analysis , Proteins/genetics , Proteins/metabolism , Sequence Alignment , Sequence Analysis, Protein , Sequence Homology, Amino Acid
11.
Nat Methods ; 7(3 Suppl): S42-55, 2010 Mar.
Article in English | MEDLINE | ID: mdl-20195256

ABSTRACT

Structural biology is rapidly accumulating a wealth of detailed information about protein function, binding sites, RNA, large assemblies and molecular motions. These data are increasingly of interest to a broader community of life scientists, not just structural experts. Visualization is a primary means for accessing and using these data, yet visualization is also a stumbling block that prevents many life scientists from benefiting from three-dimensional structural data. In this review, we focus on key biological questions where visualizing three-dimensional structures can provide insight and describe available methods and tools.


Subject(s)
Image Processing, Computer-Assisted , Macromolecular Substances , Crystallography, X-Ray , Internet , Models, Molecular , Molecular Conformation
12.
Bioinformatics ; 20(15): 2476-8, 2004 Oct 12.
Article in English | MEDLINE | ID: mdl-15087318

ABSTRACT

UNLABELLED: In this paper we present SRS 3D, a new service that allows users to easily and rapidly find all related structures for a given target sequence; structures can then be viewed together with sequences, alignments and sequence features (currently from UniProt, InterPro and PDB). Extensive user feedback confirms that SRS 3D is intuitive and useful especially for those not expert in structures. AVAILABILITY: An SRS 3D server is provided at http://srs3d.ebi.ac.uk/.


Subject(s)
Database Management Systems , Databases, Protein , Information Storage and Retrieval/methods , Models, Molecular , Proteins/chemistry , Sequence Analysis, Protein/methods , User-Computer Interface , Protein Conformation , Software , Systems Integration
13.
Nucleic Acids Res ; 31(1): 494-8, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520061

ABSTRACT

We introduce the PSSH ('Protein Sequence-to-Structure Homologies') database derived from HSSP2, an improved version of the HSSP ('Homology-derived Secondary Structure of Proteins') database [Dodge et al. (1998) Nucleic Acids Res., 26, 313-315]. Whereas each HSSP entry lists all protein sequences related to a given 3D structure, PSSH is the 'inverse', with each entry listing all structures related to a given sequence. In addition, we introduce two other derived databases: HSSPchain, in which each entry lists all sequences related to a given PDB chain, and HSSPalign, in which each entry gives details of one sequence aligned onto one PDB chain. This re-organization makes it easier to navigate from sequence to structure, and to map sequence features onto 3D structures. Currently (September 2002), PSSH provides structural information for over 400 000 protein sequences, covering 48% of SWALL and 61% of SWISS-PROT sequences; HSSPchain provides sequence information for over 25 000 PDB chains, and HSSPalign gives over 14 million sequence-to-structure alignments. The databases can be accessed via SRS 3D, an extension to the SRS system, at http://srs3d.ebi.ac.uk/.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Sequence Alignment , Structural Homology, Protein , Animals , Sequence Analysis, Protein , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...