Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Int J Cancer ; 143(11): 2800-2813, 2018 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-29987844

RESUMO

In many families with suspected Lynch syndrome (LS), no germline mutation in the causative mismatch repair (MMR) genes is detected during routine diagnostics. To identify novel causative genes for LS, the present study investigated 77 unrelated, mutation-negative patients with clinically suspected LS and a loss of MSH2 in tumor tissue. An analysis for genomic copy number variants (CNV) was performed, with subsequent next generation sequencing (NGS) of selected candidate genes in a subgroup of the cohort. Genomic DNA was genotyped using Illumina's HumanOmniExpress Bead Array. After quality control and filtering, 25 deletions and 16 duplications encompassing 73 genes were identified in 28 patients. No recurrent CNV was detected, and none of the CNVs affected the regulatory regions of MSH2. A total of 49 candidate genes from genomic regions implicated by the present CNV analysis and 30 known or assumed risk genes for colorectal cancer (CRC) were then sequenced in a subset of 38 patients using a customized NGS gene panel and Sanger sequencing. Single nucleotide variants were identified in 14 candidate genes from the CNV analysis. The most promising of these candidate genes were: (i) PRKCA, PRKDC, and MCM4, as a functional relation to MSH2 is predicted by network analysis, and (ii) CSMD1, as this is commonly mutated in CRC. Furthermore, six patients harbored POLE variants outside the exonuclease domain, suggesting that these might be implicated in hereditary CRC. Analyses in larger cohorts of suspected LS patients recruited via international collaborations are warranted to verify the present findings.


Assuntos
Neoplasias Colorretais Hereditárias sem Polipose/genética , Variações do Número de Cópias de DNA/genética , Adulto , Neoplasias Colorretais/genética , Reparo de Erro de Pareamento de DNA/genética , Feminino , Genótipo , Mutação em Linhagem Germinativa/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Masculino
2.
Nucleic Acids Res ; 43(Database issue): D257-60, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25300481

RESUMO

SMART (Simple Modular Architecture Research Tool) is a web resource (http://smart.embl.de/) providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures. In the current version, SMART contains manually curated models for more than 1200 protein domains, with ∼ 200 new models since our last update article. The underlying protein databases were synchronized with UniProt, Ensembl and STRING, bringing the total number of annotated domains and other protein features above 100 million. SMART's 'Genomic' mode, which annotates proteins from completely sequenced genomes was greatly expanded and now includes 2031 species, compared to 1133 in the previous release. SMART analysis results pages have been completely redesigned and include links to several new information sources. A new, vector-based display engine has been developed for protein schematics in SMART, which can also be exported as high-resolution bitmap images for easy inclusion into other documents. Taxonomic tree displays in SMART have been significantly improved, and can be easily navigated using the integrated search engine.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Curadoria de Dados , Mapeamento de Interação de Proteínas , Estrutura Terciária de Proteína/genética
3.
Nucleic Acids Res ; 42(22): 13525-33, 2014 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-25398899

RESUMO

The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community.


Assuntos
Chaetomium/genética , Genoma Fúngico , Anotação de Sequência Molecular , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Genes Fúngicos , Íntrons , Proteoma/metabolismo , Pseudogenes , Transcriptoma
4.
Nucleic Acids Res ; 40(Database issue): D302-5, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22053084

RESUMO

SMART (Simple Modular Architecture Research Tool) is an online resource (http://smart.embl.de/) for the identification and annotation of protein domains and the analysis of protein domain architectures. SMART version 7 contains manually curated models for 1009 protein domains, 200 more than in the previous version. The current release introduces several novel features and a streamlined user interface resulting in a faster and more comfortable workflow. The underlying protein databases were greatly expanded, resulting in a 2-fold increase in number of annotated domains and features. The database of completely sequenced genomes now includes 1133 species, compared to 630 in the previous release. Domain architecture analysis results can now be exported and visualized through the iTOL phylogenetic tree viewer. 'metaSMART' was introduced as a novel subresource dedicated to the exploration and analysis of domain architectures in various metagenomics data sets. An advanced full text search engine was implemented, covering the complete annotations for SMART and Pfam domains, as well as the complete set of protein descriptions, allowing users to quickly find relevant information.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Gráficos por Computador , Metagenômica , Mapas de Interação de Proteínas
5.
Nucleic Acids Res ; 40(Database issue): D284-9, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22096231

RESUMO

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721,801 orthologous groups, encompassing a total of 4,396,591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101,208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450,904 orthologous groups (62.5%).


Assuntos
Bases de Dados Genéticas , Filogenia , Genômica , Proteínas/genética , Proteínas/fisiologia , Homologia de Sequência , Interface Usuário-Computador
6.
Bioessays ; 33(10): 769-80, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21853451

RESUMO

The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.


Assuntos
Biologia Computacional/métodos , Genes , Proteínas/genética , Algoritmos , Animais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Internet , Anotação de Sequência Molecular , Mucinas/genética , Mucinas/metabolismo , Filogenia , Proteínas/metabolismo , Reprodutibilidade dos Testes , Especificidade da Espécie , Interface Usuário-Computador
7.
Nucleic Acids Res ; 39(Database issue): D561-8, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21045058

RESUMO

An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Integração de Sistemas , Interface Usuário-Computador
8.
Br J Haematol ; 157(2): 180-7, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22296450

RESUMO

Transient myeloproliferative disorder (TMD) of the newborn and acute megakaryoblastic leukaemia (AMKL) in children with Down syndrome (DS) represent paradigmatic models of leukaemogenesis. Chromosome 21 gene dosage effects and truncating mutations of the X-chromosomal transcription factor GATA1 synergize to trigger TMD and AMKL in most patients. Here, we report the occurrence of TMD, which spontaneously remitted and later progressed to AMKL in a patient without DS but with a distinct dysmorphic syndrome. Genetic analysis of the leukaemic clone revealed somatic trisomy 21 and a truncating GATA1 mutation. The analysis of the patient's normal blood cell DNA on a genomic single nucleotide polymorphism (SNP) array revealed a de novo germ line 2·58 Mb 15q24 microdeletion including 41 known genes encompassing the tumour suppressor PML. Genomic context analysis of proteins encoded by genes that are included in the microdeletion, chromosome 21-encoded proteins and GATA1 suggests that the microdeletion may trigger leukaemogenesis by disturbing the balance of a hypothetical regulatory network of normal megakaryopoiesis involving PML, SUMO3 and GATA1. The 15q24 microdeletion may thus represent the first genetic hit to initiate leukaemogenesis and implicates PML and SUMO3 as novel components of the leukaemogenic network in TMD/AMKL.


Assuntos
Cromossomos Humanos Par 15/genética , Síndrome de Down/genética , Leucemia Megacarioblástica Aguda/genética , Transtornos Mieloproliferativos/genética , Proteínas Nucleares/genética , Deleção de Sequência , Fatores de Transcrição/genética , Proteínas Supressoras de Tumor/genética , Ubiquitinas/genética , Criança , Pré-Escolar , Síndrome de Down/patologia , Fator de Transcrição GATA1/genética , Humanos , Lactente , Leucemia Megacarioblástica Aguda/tratamento farmacológico , Leucemia Megacarioblástica Aguda/patologia , Masculino , Transtornos Mieloproliferativos/tratamento farmacológico , Transtornos Mieloproliferativos/patologia , Proteína da Leucemia Promielocítica
9.
PLoS Comput Biol ; 7(12): e1002269, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22144877

RESUMO

The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene.Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon-count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence.To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.


Assuntos
Dosagem de Genes , Genoma/genética , Genômica/métodos , Filogenia , Animais , Bases de Dados Genéticas , Evolução Molecular , Etiquetas de Sequências Expressas , Humanos , Família Multigênica
10.
Nucleic Acids Res ; 37(Database issue): D229-32, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18978020

RESUMO

Simple modular architecture research tool (SMART) is an online tool (http://smart.embl.de/) for the identification and annotation of protein domains. It provides a user-friendly platform for the exploration and comparative study of domain architectures in both proteins and genes. The current release of SMART contains manually curated models for 784 protein domains. Recent developments were focused on further data integration and improving user friendliness. The underlying protein database based on completely sequenced genomes was greatly expanded and now includes 630 species, compared to 191 in the previous release. As an initial step towards integrating information on biological pathways into SMART, our domain annotations were extended with data on metabolic pathways and links to several pathways resources. The interaction network view was completely redesigned and is now available for more than 2 million proteins. In addition to the standard web access to the database, users can now query SMART using distributed annotation system (DAS) or through a simple object access protocol (SOAP) based web service.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Internet , Redes e Vias Metabólicas , Mapeamento de Interação de Proteínas , Software , Interface Usuário-Computador
11.
Nucleic Acids Res ; 37(Database issue): D412-6, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18940858

RESUMO

Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein-protein interactions currently available. STRING can be reached at http://string-db.org/.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/metabolismo , Genômica , Complexos Multiproteicos/metabolismo , Proteínas/química , Proteínas/genética , Interface Usuário-Computador
12.
Nucleic Acids Res ; 36(Database issue): D250-4, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17942413

RESUMO

The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-supervised Orthologous Groups'), which contains orthologous groups constructed from Smith-Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at http://eggnog.embl.de.


Assuntos
Bases de Dados Genéticas , Genes , Genômica , Filogenia , Proteínas/genética , Animais , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/estatística & dados numéricos , Internet , Proteínas/classificação , Proteínas/fisiologia , Controle de Qualidade , Interface Usuário-Computador
13.
J Bacteriol ; 191(1): 32-41, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18849420

RESUMO

The emerging coverage of diverse habitats by metagenomic shotgun data opens new avenues of discovering functional novelty using computational tools. Here, we apply three different concepts for predicting novel functions within light-mediated microbial pathways in five diverse environments. Using phylogenetic approaches, we discovered two novel deep-branching subfamilies of photolyases (involved in light-mediated repair) distributed abundantly in high-UV environments. Using neighborhood approaches, we were able to assign seven novel functional partners in luciferase synthesis, nitrogen metabolism, and quorum sensing to BLUF domain-containing proteins (involved in light sensing). Finally, by domain analysis, for RcaE proteins (involved in chromatic adaptation), we predict 16 novel domain architectures that indicate novel functionalities in habitats with little or no light. Quantification of protein abundance in the various environments supports our findings that bacteria utilize light for sensing, repair, and adaptation far more widely than previously thought. While the discoveries illustrate the opportunities in function discovery, we also discuss the immense conceptual and practical challenges that come along with this new type of data.


Assuntos
Bactérias/genética , Genes/efeitos da radiação , Genômica/métodos , Bactérias/classificação , Bactérias/crescimento & desenvolvimento , Bactérias/efeitos da radiação , Proteínas de Bactérias/genética , Ecossistema , Meio Ambiente , Genoma Bacteriano , Luz , Filogenia , Plantas/classificação , Plantas/genética , Raios Ultravioleta
14.
Trends Genet ; 22(11): 585-9, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16979784

RESUMO

In a genome-wide analysis, we have identified 85 human genes encoding 103 protein isoforms that resemble retroviral Gag proteins. These genes were domesticated from retrotransposons in at least five independent events during vertebrate evolution and were subsequently duplicated further in mammals. Structural insights into the mammalian proteins can be inferred by homology to Gag from viruses such as HIV; in turn, the cellular roles of the mammalian Gag homologs, such as apoptosis-related functions and binding to ubiquitin ligases, might hint at further functionality of viral Gag itself.


Assuntos
Evolução Molecular , Produtos do Gene gag/fisiologia , Genoma Humano , Proteínas Virais/genética , Animais , Produtos do Gene gag/genética , Repetição Terminal Longa de HIV/genética , Humanos , Mamíferos , Isoformas de Proteínas/genética , Isoformas de Proteínas/fisiologia , Retroelementos/genética
15.
Nucleic Acids Res ; 35(Database issue): D358-62, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17098935

RESUMO

Information on protein-protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files. As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 1.5 million proteins for which associations have been pre-computed. Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation. STRING is available at http://string.embl.de/


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Bases de Dados de Proteínas/normas , Internet , Homologia de Sequência de Aminoácidos , Integração de Sistemas , Interface Usuário-Computador
16.
Curr Mol Med ; 8(8): 768-73, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19075674

RESUMO

Human lipoxygenases and products of their catalytic reaction have a well established connection to many human diseases. Despite their importance in inflammation, cancer, cardiorenal and other ailments the drug development is impaired by the lack of structural details to understand their intricate specificity and function in molecular and cellular signaling. The major effort so far has been directed towards understanding the determinants of their specificity and inhibition of their active site with the iron cofactor. Their structure is believed to consist of only two domains: one regulatory - a beta-sandwich, important for membrane binding, and one, mostly helical, catalytic domain. Although recently published cohort studies on single nucleotide polymorphism and occurrence of diseases, SAXS analysis and new biochemical data throw new light on lipoxygenase suggesting symbiosis of regulatory functions with an allosteric mechanism and more flexible structure than anticipated. The goal of this brief review is to direct an attention to the structural features of an anticipated topology and stimulate discussion/research to prove or disapprove our hypothesis that lipoxygenases may possess about approximately 110 amino acids PDZ-like fragments of functional importance. If they do have a second regulatory domain, it might help to explain their association with other molecules, role in signaling pathways and present a new avenue to explore the regulation of their behavior, and thus intervention in the course of diseases.


Assuntos
Lipoxigenase/química , Sequência de Aminoácidos , Humanos , Lipoxigenase/genética , Lipoxigenase/metabolismo , Modelos Moleculares , Dados de Sequência Molecular , Domínios PDZ/genética , Homologia de Sequência de Aminoácidos
17.
Trends Biochem Sci ; 27(3): 113-5, 2002 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-11893501

RESUMO

This article describes a new extracellular domain--AMOP, for adhesion-associated domain in MUC4 and other proteins. This domain occurs in putative cell adhesion molecules and in some splice variants of MUC4. MUC4 splice variants are overexpressed in several tumours; in particular, they are highly expressed in pancreatic carcinomas but not in normal pancreas. The presence of AMOP in cell adhesion molecules could be indicative of a role for this domain in adhesion.


Assuntos
Processamento Alternativo , Mucinas/genética , Neoplasias Pancreáticas/genética , Sequência de Aminoácidos , Biomarcadores Tumorais/metabolismo , Adesão Celular/fisiologia , Humanos , Dados de Sequência Molecular , Mucina-4 , Prognóstico , Homologia de Sequência de Aminoácidos
18.
Trends Biochem Sci ; 27(4): 168-70, 2002 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-11943536

RESUMO

This article describes a novel domain, BSD, that is present in basal transcription factors, synapse-associated proteins and several hypothetical proteins. It occurs in a variety of species ranging from primal protozoan to human. The BSD domain is characterized by three predicted alpha helices, which probably form a three-helical bundle, as well as by conserved tryptophan and phenylalanine residues, located at the C terminus of the domain.


Assuntos
Motivos de Aminoácidos , Neuropeptídeos/química , Fatores de Transcrição/química , Sequência de Aminoácidos , Animais , Humanos , Dados de Sequência Molecular , Homologia de Sequência de Aminoácidos
19.
Trends Biochem Sci ; 27(2): 59-62, 2002 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-11852237

RESUMO

In this article, we describe a novel, widespread domain (CASH) that is shared by many carbohydrate-binding proteins and sugar hydrolases. This domain occurs in more than 1000 proteins distributed among all three kingdoms of life. The CASH domain is characterized by internal repetitions of glycines and hydrophobic residues that correspond to the repetitive units of a predicted or observed right-handed beta-helix structure of the pectate lyase superfamily.


Assuntos
Motivos de Aminoácidos , Polissacarídeo-Liases/química , Animais , Sítios de Ligação , Metabolismo dos Carboidratos , Proteínas de Transporte/metabolismo , Humanos , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
20.
PLoS Biol ; 3(5): e134, 2005 May.
Artigo em Inglês | MEDLINE | ID: mdl-15799710

RESUMO

One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene-phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases.


Assuntos
Bases de Dados Genéticas , Genes , Genoma , Editoração , Bactérias/genética , Enzimas/genética , Doenças Transmitidas por Alimentos/microbiologia , Genótipo , Humanos , MEDLINE , Fenótipo , Plantas/enzimologia , Plantas/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa