Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 593(7860): 553-557, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33911286

RESUMO

Asgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes1-5. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life)6-8. Here we present a comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported. Our results substantially expand the phylogenetic diversity of Asgard and lead us to propose six additional phyla that include a deep branch that we have provisionally named Wukongarchaeota. Our phylogenomic analysis does not resolve unequivocally the evolutionary relationship between eukaryotes and Asgard archaea, but instead-depending on the choice of species and conserved genes used to build the phylogeny-supports either the origin of eukaryotes from within Asgard (as a sister group to the expanded Heimdallarchaeota-Wukongarchaeota branch) or a deeper branch for the eukaryote ancestor within archaea. Our comprehensive protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer.


Assuntos
Archaea/classificação , Genoma Arqueal , Filogenia , Evolução Biológica , Eucariotos , Metagenômica
2.
Nucleic Acids Res ; 35(Database issue): D224-8, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17202162

RESUMO

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). The InterProScan search tool is now also available via a web service at http://www.ebi.ac.uk/Tools/webservices/WSInterProScan.html.


Assuntos
Bases de Dados de Proteínas , Internet , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Proteínas/fisiologia , Análise de Sequência de Proteína , Integração de Sistemas , Interface Usuário-Computador
3.
Methods Mol Biol ; 484: 465-90, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18592196

RESUMO

Genome sequencing projects have resulted in a rapid accumulation of predicted protein sequences. With experimentally verified information on protein function lagging far behind, computational methods are used for functional annotation of proteins. Here we describe a number of protocols for protein sequence and structure analysis that can be used to infer function of uncharacterized proteins. These protocols rely on publicly available computational resources and tools and can be utilized by anyone with an Internet access.


Assuntos
Sequência de Aminoácidos , Proteínas , Homologia de Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Alinhamento de Sequência
4.
Methods Enzymol ; 422: 47-74, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17628134

RESUMO

The availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. This chapter describes certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases, and c-di-GMP phosphodiesterases. These protocols rely on publicly available computational tools and databases and can be utilized by anyone with Internet access.


Assuntos
Bactérias/genética , Proteínas Quinases/genética , Transdução de Sinais/fisiologia , Bactérias/enzimologia , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Sítios de Ligação , Sequência Conservada , Bases de Dados de Proteínas , Histidina Quinase , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos
5.
Nucleic Acids Res ; 33(Database issue): D201-5, 2005 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-15608177

RESUMO

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína , Bases de Dados de Proteínas/tendências , Humanos , Estrutura Terciária de Proteína , Alinhamento de Sequência , Integração de Sistemas
6.
Nucleic Acids Res ; 30(11): 2453-9, 2002 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-12034833

RESUMO

Sequence analysis of bacterial genomes revealed a novel DNA-binding domain. This domain is found in several response regulators of the two-component signal transduction system, such as Pseudomonas aeruginosa AlgR, involved in the regulation of alginate biosynthesis and in the pathogenesis of cystic fibrosis; Clostridium perfringens VirR, a regulator of virulence factors, and in several regulators of bacteriocin biosynthesis, previously unified in the AgrA/ComE family. Most of the transcriptional regulators that contain this DNA-binding domain are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing. We refer to it as the LytTR ('litter') domain, after Bacillus subtilis LytT and Staphylococcus aureus LytR response regulators, involved in regulation of cell autolysis. In addition to response regulators, the LytTR domain is found in combination with MHYT, PAS and other sensor domains.


Assuntos
Proteínas de Bactérias/metabolismo , Sequência Conservada/genética , DNA/metabolismo , Transativadores , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/química , Sítios de Ligação , Clostridium perfringens/química , Clostridium perfringens/genética , Biologia Computacional , DNA/genética , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Bases de Dados de Proteínas , Regulação Bacteriana da Expressão Gênica , Sequências Hélice-Volta-Hélice , Dados de Sequência Molecular , Filogenia , Ligação Proteica , Estrutura Terciária de Proteína , Pseudomonas aeruginosa/química , Pseudomonas aeruginosa/genética , Alinhamento de Sequência , Transdução de Sinais , Virulência/genética
7.
Nucleic Acids Res ; 31(1): 383-7, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520028

RESUMO

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Armazenamento e Recuperação da Informação , Modelos Moleculares , Alinhamento de Sequência
8.
Nucleic Acids Res ; 31(1): 474-7, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12520055

RESUMO

Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Homologia Estrutural de Proteína , Animais , Gráficos por Computador , Imageamento Tridimensional , Estrutura Terciária de Proteína , Proteínas/química
9.
BMC Bioinformatics ; 4: 41, 2003 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-12969510

RESUMO

BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.


Assuntos
Bases de Dados de Proteínas/tendências , Células Eucarióticas , Proteínas/classificação , Proteínas/genética , Animais , Bases de Dados de Ácidos Nucleicos/tendências , Células Eucarióticas/química , Células Eucarióticas/fisiologia , Evolução Molecular , Humanos , National Institutes of Health (U.S.) , Proteínas/fisiologia , Terminologia como Assunto , Estados Unidos
10.
Evol Bioinform Online ; 2: 197-209, 2007 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-19455212

RESUMO

The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

11.
J Bacteriol ; 185(1): 285-94, 2003 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-12486065

RESUMO

Transmembrane receptors in microorganisms, such as sensory histidine kinases and methyl-accepting chemotaxis proteins, are molecular devices for monitoring environmental changes. We report here that sensory domain sharing is widespread among different classes of transmembrane receptors. We have identified two novel conserved extracellular sensory domains, named CHASE2 and CHASE3, that are found in at least four classes of transmembrane receptors: histidine kinases, adenylate cyclases, predicted diguanylate cyclases, and either serine/threonine protein kinases (CHASE2) or methyl-accepting chemotaxis proteins (CHASE3). Three other extracellular sensory domains were shared by at least two different classes of transmembrane receptors: histidine kinases and either diguanylate cyclases, adenylate cyclases, or phosphodiesterases. These observations suggest that microorganisms use similar conserved domains to sense similar environmental signals and transmit this information via different signal transduction pathways to different regulatory circuits: transcriptional regulation (histidine kinases), chemotaxis (methyl-accepting proteins), catabolite repression (adenylate cyclases), and modulation of enzyme activity (diguanylate cyclases and phosphodiesterases). The variety of signaling pathways using the CHASE-type domains indicates that these domains sense some critically important extracellular signals.


Assuntos
Archaea/química , Bactérias/química , Receptores de Superfície Celular/química , Transdução de Sinais , Adenilil Ciclases/química , Adenilil Ciclases/genética , Sequência de Aminoácidos , Archaea/genética , Archaea/metabolismo , Proteínas Arqueais/metabolismo , Bactérias/genética , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Quimiotaxia , Biologia Computacional , Bases de Dados Genéticas , Guanilato Ciclase/química , Guanilato Ciclase/genética , Histidina Quinase , Dados de Sequência Molecular , Proteínas Quinases/química , Proteínas Quinases/genética , Receptores de Superfície Celular/genética , Alinhamento de Sequência
12.
Genome Biol ; 5(2): R7, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14759257

RESUMO

BACKGROUND: Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes. RESULTS: We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. CONCLUSIONS: The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.


Assuntos
Células Eucarióticas/classificação , Genoma , Filogenia , Proteínas/classificação , Animais , Caenorhabditis elegans/genética , Evolução Molecular , Deleção de Genes , Humanos , Células Procarióticas/classificação , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/fisiologia , Análise de Sequência de Proteína , Leveduras/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA