Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nature ; 593(7860): 553-557, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33911286

RESUMEN

Asgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes1-5. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life)6-8. Here we present a comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported. Our results substantially expand the phylogenetic diversity of Asgard and lead us to propose six additional phyla that include a deep branch that we have provisionally named Wukongarchaeota. Our phylogenomic analysis does not resolve unequivocally the evolutionary relationship between eukaryotes and Asgard archaea, but instead-depending on the choice of species and conserved genes used to build the phylogeny-supports either the origin of eukaryotes from within Asgard (as a sister group to the expanded Heimdallarchaeota-Wukongarchaeota branch) or a deeper branch for the eukaryote ancestor within archaea. Our comprehensive protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer.


Asunto(s)
Archaea/clasificación , Genoma Arqueal , Filogenia , Evolución Biológica , Eucariontes , Metagenómica
2.
Methods Mol Biol ; 484: 465-90, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18592196

RESUMEN

Genome sequencing projects have resulted in a rapid accumulation of predicted protein sequences. With experimentally verified information on protein function lagging far behind, computational methods are used for functional annotation of proteins. Here we describe a number of protocols for protein sequence and structure analysis that can be used to infer function of uncharacterized proteins. These protocols rely on publicly available computational resources and tools and can be utilized by anyone with an Internet access.


Asunto(s)
Secuencia de Aminoácidos , Proteínas , Homología de Secuencia de Aminoácido , Animales , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Alineación de Secuencia
3.
Methods Enzymol ; 422: 47-74, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17628134

RESUMEN

The availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. This chapter describes certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases, and c-di-GMP phosphodiesterases. These protocols rely on publicly available computational tools and databases and can be utilized by anyone with Internet access.


Asunto(s)
Bacterias/genética , Proteínas Quinasas/genética , Transducción de Señal/fisiología , Bacterias/enzimología , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Sitios de Unión , Secuencia Conservada , Bases de Datos de Proteínas , Histidina Quinasa , Proteínas Quinasas/química , Proteínas Quinasas/metabolismo , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido
4.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17202162

RESUMEN

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Asunto(s)
Bases de Datos de Proteínas , Internet , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/fisiología , Análisis de Secuencia de Proteína , Integración de Sistemas , Interfaz Usuario-Computador
5.
Evol Bioinform Online ; 2: 197-209, 2007 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-19455212

RESUMEN

The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

6.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-15608177

RESUMEN

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/clasificación , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas/tendencias , Humanos , Estructura Terciaria de Proteína , Alineación de Secuencia , Integración de Sistemas
7.
Genome Biol ; 5(2): R7, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-14759257

RESUMEN

BACKGROUND: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes. RESULTS: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. CONCLUSIONS: The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.


Asunto(s)
Células Eucariotas/clasificación , Genoma , Filogenia , Proteínas/clasificación , Animales , Caenorhabditis elegans/genética , Evolución Molecular , Eliminación de Gen , Humanos , Células Procariotas/clasificación , Estructura Terciaria de Proteína , Proteínas/genética , Proteínas/fisiología , Análisis de Secuencia de Proteína , Levaduras/genética
8.
BMC Bioinformatics ; 4: 41, 2003 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-12969510

RESUMEN

BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.


Asunto(s)
Bases de Datos de Proteínas/tendencias , Células Eucariotas , Proteínas/clasificación , Proteínas/genética , Animales , Bases de Datos de Ácidos Nucleicos/tendencias , Células Eucariotas/química , Células Eucariotas/fisiología , Evolución Molecular , Humanos , National Institutes of Health (U.S.) , Proteínas/fisiología , Terminología como Asunto , Estados Unidos
9.
Nucleic Acids Res ; 31(1): 383-7, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520028

RESUMEN

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Secuencia de Aminoácidos , Animales , Secuencia Conservada , Almacenamiento y Recuperación de la Información , Modelos Moleculares , Alineación de Secuencia
10.
Nucleic Acids Res ; 31(1): 474-7, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12520055

RESUMEN

Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.


Asunto(s)
Bases de Datos de Proteínas , Modelos Moleculares , Homología Estructural de Proteína , Animales , Gráficos por Computador , Imagenología Tridimensional , Estructura Terciaria de Proteína , Proteínas/química
11.
J Bacteriol ; 185(1): 285-94, 2003 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-12486065

RESUMEN

Transmembrane receptors in microorganisms, such as sensory histidine kinases and methyl-accepting chemotaxis proteins, are molecular devices for monitoring environmental changes. We report here that sensory domain sharing is widespread among different classes of transmembrane receptors. We have identified two novel conserved extracellular sensory domains, named CHASE2 and CHASE3, that are found in at least four classes of transmembrane receptors: histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and either serine/threonine protein kinases (CHASE2) or methyl-accepting chemotaxis proteins (CHASE3). Three other extracellular sensory domains were shared by at least two different classes of transmembrane receptors: histidine kinases and either diguanylate cyclases, adenylate cyclases, or phosphodiesterases. These observations suggest that microorganisms use similar conserved domains to sense similar environmental signals and transmit this information via different signal transduction pathways to different regulatory circuits: transcriptional regulation (histidine kinases), chemotaxis (methyl-accepting proteins), catabolite repression (adenylate cyclases), and modulation of enzyme activity (diguanylate cyclases and phosphodiesterases). The variety of signaling pathways using the CHASE-type domains indicates that these domains sense some critically important extracellular signals.


Asunto(s)
Archaea/química , Bacterias/química , Receptores de Superficie Celular/química , Transducción de Señal , Adenilil Ciclasas/química , Adenilil Ciclasas/genética , Secuencia de Aminoácidos , Archaea/genética , Archaea/metabolismo , Proteínas Arqueales/metabolismo , Bacterias/genética , Bacterias/metabolismo , Proteínas Bacterianas/metabolismo , Quimiotaxis , Biología Computacional , Bases de Datos Genéticas , Guanilato Ciclasa/química , Guanilato Ciclasa/genética , Histidina Quinasa , Datos de Secuencia Molecular , Proteínas Quinasas/química , Proteínas Quinasas/genética , Receptores de Superficie Celular/genética , Alineación de Secuencia
12.
Nucleic Acids Res ; 30(11): 2453-9, 2002 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-12034833

RESUMEN

Sequence analysis of bacterial genomes revealed a novel DNA-binding domain. This domain is found in several response regulators of the two-component signal transduction system, such as Pseudomonas aeruginosa AlgR, involved in the regulation of alginate biosynthesis and in the pathogenesis of cystic fibrosis; Clostridium perfringens VirR, a regulator of virulence factors, and in several regulators of bacteriocin biosynthesis, previously unified in the AgrA/ComE family. Most of the transcriptional regulators that contain this DNA-binding domain are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing. We refer to it as the LytTR ('litter') domain, after Bacillus subtilis LytT and Staphylococcus aureus LytR response regulators, involved in regulation of cell autolysis. In addition to response regulators, the LytTR domain is found in combination with MHYT, PAS and other sensor domains.


Asunto(s)
Proteínas Bacterianas/metabolismo , Secuencia Conservada/genética , ADN/metabolismo , Transactivadores , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Secuencia de Aminoácidos , Proteínas Bacterianas/química , Sitios de Unión , Clostridium perfringens/química , Clostridium perfringens/genética , Biología Computacional , ADN/genética , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/metabolismo , Bases de Datos de Proteínas , Regulación Bacteriana de la Expresión Génica , Secuencias Hélice-Giro-Hélice , Datos de Secuencia Molecular , Filogenia , Unión Proteica , Estructura Terciaria de Proteína , Pseudomonas aeruginosa/química , Pseudomonas aeruginosa/genética , Alineación de Secuencia , Transducción de Señal , Virulencia/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...