Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Cell ; 176(1-2): 391-403.e19, 2019 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-30528433

RESUMEN

Proteins and RNA functionally and physically intersect in multiple biological processes, however, currently no universal method is available to purify protein-RNA complexes. Here, we introduce XRNAX, a method for the generic purification of protein-crosslinked RNA, and demonstrate its versatility to study the composition and dynamics of protein-RNA interactions by various transcriptomic and proteomic approaches. We show that XRNAX captures all RNA biotypes and use this to characterize the sub-proteomes that interact with coding and non-coding RNAs (ncRNAs) and to identify hundreds of protein-RNA interfaces. Exploiting the quantitative nature of XRNAX, we observe drastic remodeling of the RNA-bound proteome during arsenite-induced stress, distinct from autophagy-related changes in the total proteome. In addition, we combine XRNAX with crosslinking immunoprecipitation sequencing (CLIP-seq) to validate the interaction of ncRNA with lamin B1 and EXOSC2. Thus, XRNAX is a resourceful approach to study structural and compositional aspects of protein-RNA interactions to address fundamental questions in RNA-biology.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteínas de Unión al ARN/aislamiento & purificación , ARN/aislamiento & purificación , Sitios de Unión , Complejo Multienzimático de Ribonucleasas del Exosoma/metabolismo , Humanos , Inmunoprecipitación/métodos , Lamina Tipo B/metabolismo , Unión Proteica/genética , Unión Proteica/fisiología , Biosíntesis de Proteínas/genética , Biosíntesis de Proteínas/fisiología , Procesamiento Proteico-Postraduccional , Proteínas/aislamiento & purificación , Proteínas/metabolismo , Proteoma/metabolismo , Proteómica/métodos , ARN/genética , ARN/metabolismo , ARN Mensajero/metabolismo , ARN no Traducido/metabolismo , Proteínas de Unión al ARN/metabolismo , Transcriptoma
2.
Nucleic Acids Res ; 52(D1): D107-D114, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37992296

RESUMEN

Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Proteómica , Genotipo , Metadatos , Análisis de la Célula Individual , Internet , Humanos , Animales
3.
PLoS Comput Biol ; 20(1): e1011828, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38252632

RESUMEN

The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.


Asunto(s)
Neoplasias Colorrectales , Proteómica , Humanos , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/metabolismo , Biomarcadores de Tumor/metabolismo , Proteínas Sanguíneas , Proteína Disulfuro Isomerasas
4.
J Proteome Res ; 23(6): 1948-1959, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38717300

RESUMEN

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.


Asunto(s)
Riñón , Proteómica , Animales , Proteómica/métodos , Humanos , Porcinos , Riñón/metabolismo , Riñón/química , Especificidad de Órganos , Hígado/metabolismo , Hígado/química , Bases de Datos de Proteínas , Encéfalo/metabolismo , Miocardio/metabolismo , Miocardio/química , Sus scrofa/metabolismo , Sus scrofa/genética , Proteoma/metabolismo , Proteoma/análisis , Espectrometría de Masas
5.
J Proteome Res ; 23(7): 2518-2531, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38810119

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have reanalyzed publicly available mass spectrometry proteomics data sets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,565 phosphosites on serine, threonine, and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety and clustered the data to identify groups of sites with similar patterns across rice family groups. The data has been loaded into UniProt Knowledge-Base─enabling researchers to visualize sites alongside other data on rice proteins, e.g., structural models from AlphaFold2, PeptideAtlas, and the PRIDE database─enabling visualization of source evidence, including scores and supporting mass spectra.


Asunto(s)
Genoma de Planta , Oryza , Fosfoproteínas , Proteínas de Plantas , Proteómica , Transducción de Señal , Oryza/genética , Oryza/metabolismo , Oryza/química , Proteómica/métodos , Fosfoproteínas/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/química , Fosfoproteínas/análisis , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fosforilación , Procesamiento Proteico-Postraduccional , Fosfopéptidos/metabolismo , Fosfopéptidos/análisis , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Espectrometría de Masas
6.
Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34723319

RESUMEN

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.


Asunto(s)
Bases de Datos de Proteínas , Metadatos/estadística & datos numéricos , Anotación de Secuencia Molecular/estadística & datos numéricos , Péptidos/química , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Bibliometría , Conjuntos de Datos como Asunto , Humanos , Almacenamiento y Recuperación de la Información , Internet , Espectrometría de Masas , Péptidos/genética , Péptidos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteómica/instrumentación , Proteómica/métodos , Alineación de Secuencia
7.
Nucleic Acids Res ; 50(D1): D129-D140, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850121

RESUMEN

The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.


Asunto(s)
Bases de Datos Genéticas , Proteínas/genética , Proteómica , Programas Informáticos , Biología Computacional , Perfilación de la Expresión Génica , Humanos , Proteínas/química , RNA-Seq , Análisis de Secuencia de ARN , Análisis de la Célula Individual
8.
J Proteome Res ; 22(3): 729-742, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36577097

RESUMEN

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/análisis , Proteómica/métodos , Perfilación de la Expresión Génica , Bases de Datos Factuales , Espectrometría de Masas/métodos , Bases de Datos de Proteínas
9.
PLoS Comput Biol ; 18(6): e1010174, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35714157

RESUMEN

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.


Asunto(s)
Proteínas , Proteómica , Animales , Encéfalo/metabolismo , Ratones , Proteínas/metabolismo , Ratas
10.
J Proteome Res ; 21(7): 1603-1615, 2022 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-35640880

RESUMEN

Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control. In this work, we develop the concept of scoring modifications on a decoy amino acid, that is, one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of amino acids, on both synthetic and real data sets, demonstrating that the selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys. We propose the use of a decoy amino acid to control false reporting in the literature and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840.


Asunto(s)
Aminoácidos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem/métodos
11.
RNA ; 23(10): 1479-1492, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28701522

RESUMEN

This article describes the creation of the first expert manually curated noncoding RNA interaction networks for S. cerevisiae The RNA-RNA and RNA-protein interaction networks have been carefully extracted from the experimental literature and made available through the IntAct database (www.ebi.ac.uk/intact). We provide an initial network analysis and compare their properties to the much larger protein-protein interaction network. We find that the proteins that bind to ncRNAs in the network contain only a small proportion of classical RNA binding domains. We also see an enrichment of WD40 domains suggesting their direct involvement in ncRNA interactions. We discuss the challenges in collecting noncoding RNA interaction data and the opportunities for worldwide collaboration to fill the unmet need for this data.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , ARN no Traducido/genética , Saccharomyces cerevisiae/genética , Ontología de Genes , ARN de Hongos , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
12.
bioRxiv ; 2023 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-38014076

RESUMEN

Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.

13.
Sci Data ; 9(1): 335, 2022 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-35701420

RESUMEN

The number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Data Dependent Acquisition datasets, the re-use of DIA datasets has been rather limited to date, despite its high potential, due to the technical challenges involved. We introduce a (re-)analysis pipeline for public SWATH-MS datasets which includes a combination of metadata annotation protocols, automated workflows for MS data analysis, statistical analysis, and the integration of the results into the Expression Atlas resource. Automation is orchestrated with Nextflow, using containerised open analysis software tools, rendering the pipeline readily available and reproducible. To demonstrate its utility, we reanalysed 10 public DIA datasets from the PRIDE database, comprising 1,278 SWATH-MS runs. The robustness of the analysis was evaluated, and the results compared to those obtained in the original publications. The final expression values were integrated into Expression Atlas, making SWATH-MS experiments more widely available and combining them with expression data originating from other proteomics and transcriptomics datasets.


Asunto(s)
Proteómica , Programas Informáticos , Análisis de Datos , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Espectrometría de Masas/métodos , Proteómica/métodos
14.
Curr Protoc Bioinformatics ; 60: 3.15.1-3.15.23, 2017 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-29220076

RESUMEN

Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc.


Asunto(s)
Bases de Datos de Proteínas , Homología de Secuencia de Aminoácido , Programas Informáticos , Algoritmos , Biología Computacional , Humanos , Internet , Cadenas de Markov , Proteínas , Alineación de Secuencia
15.
J Exp Med ; 214(4): 1111-1128, 2017 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-28351984

RESUMEN

The phagocyte respiratory burst is crucial for innate immunity. The transfer of electrons to oxygen is mediated by a membrane-bound heterodimer, comprising gp91phox and p22phox subunits. Deficiency of either subunit leads to severe immunodeficiency. We describe Eros (essential for reactive oxygen species), a protein encoded by the previously undefined mouse gene bc017643, and show that it is essential for host defense via the phagocyte NAPDH oxidase. Eros is required for expression of the NADPH oxidase components, gp91phox and p22phox Consequently, Eros-deficient mice quickly succumb to infection. Eros also contributes to the formation of neutrophil extracellular traps (NETS) and impacts on the immune response to melanoma metastases. Eros is an ortholog of the plant protein Ycf4, which is necessary for expression of proteins of the photosynthetic photosystem 1 complex, itself also an NADPH oxio-reductase. We thus describe the key role of the previously uncharacterized protein Eros in host defense.


Asunto(s)
Proteínas de la Membrana/fisiología , Fagocitos/fisiología , Especies Reactivas de Oxígeno/metabolismo , Estallido Respiratorio/fisiología , Animales , Grupo Citocromo b/análisis , Grupo Citocromo b/fisiología , Retículo Endoplásmico/metabolismo , Células HEK293 , Humanos , Inmunidad Innata , Macrófagos/inmunología , Glicoproteínas de Membrana/análisis , Glicoproteínas de Membrana/fisiología , Ratones , Ratones Endogámicos C57BL , NADPH Oxidasa 2 , NADPH Oxidasas/análisis , NADPH Oxidasas/fisiología , Neutrófilos/inmunología , Fagocitosis
16.
Genome Biol ; 16: 88, 2015 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-25924720

RESUMEN

BACKGROUND: Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. RESULTS: Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. CONCLUSION: Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.


Asunto(s)
Proteínas Bacterianas/genética , Eliminación de Gen , Genes Bacterianos , Luciferasas/genética , Oxidorreductasas/genética , Proteínas Bacterianas/metabolismo , Burkholderia cenocepacia/enzimología , Burkholderia cenocepacia/genética , Proteínas Portadoras/genética , Proteínas Portadoras/metabolismo , Cryptococcus/enzimología , Cryptococcus/genética , Escherichia coli/enzimología , Escherichia coli/genética , Evolución Molecular , Humanos , Lactobacillus/enzimología , Lactobacillus/genética , Luciferasas/metabolismo , Modelos Moleculares , Oxidorreductasas/metabolismo , Photobacterium/enzimología , Photobacterium/genética , Filogenia , Estructura Terciaria de Proteína , Pyrococcus furiosus/enzimología , Pyrococcus furiosus/genética , Staphylococcus aureus/enzimología , Staphylococcus aureus/genética
17.
PLoS One ; 6(11): e25570, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22073138

RESUMEN

Vibrio cholerae, the enteropathogenic gram negative bacteria is one of the main causative agents of waterborne diseases like cholera. About 1/3(rd) of the organism's genome is uncharacterised with many protein coding genes lacking structure and functional information. These proteins form significant fraction of the genome and are crucial in understanding the organism's complete functional makeup. In this study we report the general structure and function of a family of hypothetical proteins, Domain of Unknown Function 3233 (DUF3233), which are conserved across gram negative gammaproteobacteria (especially in Vibrio sp. and similar bacteria). Profile and HMM based sequence search methods were used to screen homologues of DUF3233. The I-TASSER fold recognition method was used to build a three dimensional structural model of the domain. The structure resembles the transmembrane beta-barrel with an axial N-terminal helix and twelve antiparallel beta-strands. Using a combination of amphipathy and discrimination analysis we analysed the potential transmembrane beta-barrel forming properties of DUF3233. Sequence, structure and phylogenetic analysis of DUF3233 indicates that this gram negative bacterial hypothetical protein resembles the beta-barrel translocation unit of autotransporter Va secretory mechanism with a gene organisation that differs from the conventional Va system.


Asunto(s)
Proteínas Portadoras/metabolismo , Proteobacteria/metabolismo , Proteínas Portadoras/química , Modelos Moleculares , Transporte de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA