Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nucleic Acids Res ; 49(W1): W535-W540, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-33999203

RESUMEN

Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.


Asunto(s)
Conformación Proteica , Programas Informáticos , Sitios de Unión , Proteínas de la Nucleocápside de Coronavirus/química , Proteínas de Unión al ADN/química , Fosfoproteínas/química , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/fisiología , Proteínas de Unión al ARN/química , Alineación de Secuencia , Análisis de Secuencia de Proteína
2.
Mol Syst Biol ; 17(9): e10079, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34519429

RESUMEN

We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.


Asunto(s)
Enzima Convertidora de Angiotensina 2/metabolismo , Interacciones Huésped-Patógeno/genética , Procesamiento Proteico-Postraduccional , SARS-CoV-2/metabolismo , Glicoproteína de la Espiga del Coronavirus/metabolismo , Sistemas de Transporte de Aminoácidos Neutros/química , Sistemas de Transporte de Aminoácidos Neutros/genética , Sistemas de Transporte de Aminoácidos Neutros/metabolismo , Enzima Convertidora de Angiotensina 2/química , Enzima Convertidora de Angiotensina 2/genética , Sitios de Unión , COVID-19/genética , COVID-19/metabolismo , COVID-19/virología , Biología Computacional/métodos , Proteínas de la Envoltura de Coronavirus/química , Proteínas de la Envoltura de Coronavirus/genética , Proteínas de la Envoltura de Coronavirus/metabolismo , Proteínas de la Nucleocápside de Coronavirus/química , Proteínas de la Nucleocápside de Coronavirus/genética , Proteínas de la Nucleocápside de Coronavirus/metabolismo , Humanos , Proteínas de Transporte de Membrana Mitocondrial/química , Proteínas de Transporte de Membrana Mitocondrial/genética , Proteínas de Transporte de Membrana Mitocondrial/metabolismo , Proteínas del Complejo de Importación de Proteínas Precursoras Mitocondriales , Modelos Moleculares , Imitación Molecular , Neuropilina-1/química , Neuropilina-1/genética , Neuropilina-1/metabolismo , Fosfoproteínas/química , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Multimerización de Proteína , SARS-CoV-2/química , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Proteínas de la Matriz Viral/química , Proteínas de la Matriz Viral/genética , Proteínas de la Matriz Viral/metabolismo , Proteínas Viroporinas/química , Proteínas Viroporinas/genética , Proteínas Viroporinas/metabolismo , Replicación Viral
3.
Proteomics ; 18(21-22): e1800227, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30318701

RESUMEN

Despite substantial and successful projects for structural genomics, many proteins remain for which neither experimental structures nor homology-based models are known for any part of the amino acid sequence. These have been called "dark proteins," in contrast to non-dark proteins, in which at least part of the sequence has a known or inferred structure. It has been hypothesized that non-dark proteins may be more abundantly expressed than dark proteins, which are known to have much fewer sequence relatives. Surprisingly, the opposite has been observed: human dark and non-dark proteins had quite similar levels of expression, in terms of both mRNA and protein abundance. Such high levels of expression strongly indicate that dark proteins-as a group-are important for cellular function. This is remarkable, given how carefully structural biologists have focused on proteins crucial for function, and highlights the important challenge posed by dark proteins in future research.


Asunto(s)
Bases de Datos de Proteínas , Proteoma/análisis , Biología Computacional , Conformación Proteica
4.
Proc Natl Acad Sci U S A ; 112(52): 15898-903, 2015 Dec 29.
Artículo en Inglés | MEDLINE | ID: mdl-26578815

RESUMEN

We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Algoritmos , Animales , Archaea/genética , Archaea/metabolismo , Bacterias/genética , Bacterias/metabolismo , Eucariontes/metabolismo , Humanos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteoma/química , Proteoma/genética , Virus/genética , Virus/metabolismo
5.
BMC Genomics ; 17: 133, 2016 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-26911138

RESUMEN

BACKGROUND: Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). RESULTS: Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. CONCLUSIONS: These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.


Asunto(s)
Escherichia coli O157/genética , Evolución Molecular , Genes Bacterianos , Proteoma/genética , Transcriptoma , Animales , Bovinos , Biología Computacional , Proteínas de Escherichia coli/genética , Espectrometría de Masas , Fenotipo , ARN Bacteriano/genética , Análisis de Secuencia de ARN
6.
Nucleic Acids Res ; 42(Web Server issue): W337-43, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24799431

RESUMEN

PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein-protein binding sites (ISIS2), protein-polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org.


Asunto(s)
Conformación Proteica , Programas Informáticos , Sustitución de Aminoácidos , Sitios de Unión , Ontología de Genes , Internet , Proteínas Intrínsecamente Desordenadas/química , Proteínas de la Membrana/química , Mutación , Mapeo de Interacción de Proteínas , Proteínas/análisis , Proteínas/genética , Proteínas/metabolismo , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido
7.
BMC Bioinformatics ; 16 Suppl 11: S7, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26329268

RESUMEN

BACKGROUND: To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS: To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS: The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.


Asunto(s)
Precursor de Proteína beta-Amiloide/química , Biología Computacional/métodos , Gráficos por Computador , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Familia-src Quinasas/química , Precursor de Proteína beta-Amiloide/metabolismo , Linfocitos B/metabolismo , Bases de Datos de Proteínas , Humanos , Conformación Proteica , Procesamiento Proteico-Postraduccional , Familia-src Quinasas/metabolismo
9.
Nat Methods ; 7(3 Suppl): S42-55, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20195256

RESUMEN

Structural biology is rapidly accumulating a wealth of detailed information about protein function, binding sites, RNA, large assemblies and molecular motions. These data are increasingly of interest to a broader community of life scientists, not just structural experts. Visualization is a primary means for accessing and using these data, yet visualization is also a stumbling block that prevents many life scientists from benefiting from three-dimensional structural data. In this review, we focus on key biological questions where visualizing three-dimensional structures can provide insight and describe available methods and tools.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Sustancias Macromoleculares , Cristalografía por Rayos X , Internet , Modelos Moleculares , Conformación Molecular
10.
PLoS One ; 12(9): e0184119, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28902868

RESUMEN

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.


Asunto(s)
ADN Intergénico/genética , Escherichia coli O157/genética , Genes Bacterianos , Genoma Bacteriano , Secuencia Conservada , ADN Bacteriano/genética , Estudios de Asociación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Sistemas de Lectura Abierta/genética , ARN Bacteriano/genética , Transcriptoma
11.
Nucleic Acids Res ; 31(1): 494-8, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520061

RESUMEN

We introduce the PSSH ('Protein Sequence-to-Structure Homologies') database derived from HSSP2, an improved version of the HSSP ('Homology-derived Secondary Structure of Proteins') database [Dodge et al. (1998) Nucleic Acids Res., 26, 313-315]. Whereas each HSSP entry lists all protein sequences related to a given 3D structure, PSSH is the 'inverse', with each entry listing all structures related to a given sequence. In addition, we introduce two other derived databases: HSSPchain, in which each entry lists all sequences related to a given PDB chain, and HSSPalign, in which each entry gives details of one sequence aligned onto one PDB chain. This re-organization makes it easier to navigate from sequence to structure, and to map sequence features onto 3D structures. Currently (September 2002), PSSH provides structural information for over 400 000 protein sequences, covering 48% of SWALL and 61% of SWISS-PROT sequences; HSSPchain provides sequence information for over 25 000 PDB chains, and HSSPalign gives over 14 million sequence-to-structure alignments. The databases can be accessed via SRS 3D, an extension to the SRS system, at http://srs3d.ebi.ac.uk/.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Alineación de Secuencia , Homología Estructural de Proteína , Animales , Análisis de Secuencia de Proteína , Programas Informáticos
12.
Structure ; 22(7): 938-9, 2014 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-25007224

RESUMEN

Structure comparisons are now the first step when a new experimental high-resolution protein structure has been determined. In this issue of Structure, Wiederstein and colleagues describe their latest tool for comparing structures, which gives us the unprecedented power to discover crucial structural connections between whole complexes of proteins in the full structural database in real time.


Asunto(s)
Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Complejos Multiproteicos/química , Estructura Cuaternaria de Proteína
13.
Bioinformatics ; 20(15): 2476-8, 2004 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-15087318

RESUMEN

UNLABELLED: In this paper we present SRS 3D, a new service that allows users to easily and rapidly find all related structures for a given target sequence; structures can then be viewed together with sequences, alignments and sequence features (currently from UniProt, InterPro and PDB). Extensive user feedback confirms that SRS 3D is intuitive and useful especially for those not expert in structures. AVAILABILITY: An SRS 3D server is provided at http://srs3d.ebi.ac.uk/.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Modelos Moleculares , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Interfaz Usuario-Computador , Conformación Proteica , Programas Informáticos , Integración de Sistemas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA