Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Methods Mol Biol ; 1446: 25-37, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27812933

RESUMEN

The Gene Ontology (GO) project is the largest resource for cataloguing gene function. The combination of solid conceptual underpinnings and a practical set of features have made the GO a widely adopted resource in the research community and an essential resource for data analysis. In this chapter, we provide a concise primer for all users of the GO. We briefly introduce the structure of the ontology and explain how to interpret annotations associated with the GO.


Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Animales , ADN/genética , Bases de Datos Genéticas , Humanos , Internet , Proteínas/genética , ARN/genética
2.
Methods Mol Biol ; 1446: 97-109, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27812938

RESUMEN

Two avenues to understanding gene function are complementary and often overlapping: experimental work and computational prediction. While experimental annotation generally produces high-quality annotations, it is low throughput. Conversely, computational annotations have broad coverage, but the quality of annotations may be variable, and therefore evaluating the quality of computational annotations is a critical concern.In this chapter, we provide an overview of strategies to evaluate the quality of computational annotations. First, we discuss why evaluating quality in this setting is not trivial. We highlight the various issues that threaten to bias the evaluation of computational annotations, most of which stem from the incompleteness of biological databases. Second, we discuss solutions that address these issues, for example, targeted selection of new experimental annotations and leveraging the existing experimental annotations.


Asunto(s)
Biología Computacional/métodos , Ontología de Genes , Anotación de Secuencia Molecular/métodos , Animales , Simulación por Computador , Bases de Datos Genéticas , Genoma , Humanos , Modelos Biológicos , Proteínas/genética , Proteínas/metabolismo
3.
Methods Mol Biol ; 1446: 207-220, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27812945

RESUMEN

Contemporary techniques in biology produce readouts for large numbers of genes simultaneously, the typical example being differential gene expression measurements. Moreover, those genes are often richly annotated using GO terms that describe gene function and that can be used to summarize the results of the genome-scale experiments. However, making sense of such GO enrichment analyses may be challenging. For instance, overrepresented GO functions in a set of differentially expressed genes are typically output as a flat list, a format not adequate to capture the complexities of the hierarchical structure of the GO annotation labels.In this chapter, we survey various methods to visualize large, difficult-to-interpret lists of GO terms. We catalog their availability-Web-based or standalone, the main principles they employ in summarizing large lists of GO terms, and the visualization styles they support. These brief commentaries on each software are intended as a helpful inventory, rather than comprehensive descriptions of the underlying algorithms. Instead, we show examples of their use and suggest that the choice of an appropriate visualization tool may be crucial to the utility of GO in biological discovery.


Asunto(s)
Ontología de Genes , Animales , Recursos Audiovisuales , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Internet , Programas Informáticos
4.
Genome Biol ; 17(1): 184, 2016 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-27604469

RESUMEN

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Asunto(s)
Biología Computacional , Proteínas/química , Programas Informáticos , Relación Estructura-Actividad , Algoritmos , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Anotación de Secuencia Molecular , Proteínas/genética
5.
PLoS Comput Biol ; 11(5): e1004095, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-26020646

RESUMEN

Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.


Asunto(s)
Transferencia de Gen Horizontal , Composición de Base , Biología Computacional , Simulación por Computador , ADN Bacteriano/genética , Bases de Datos Genéticas , Farmacorresistencia Bacteriana/genética , Evolución Molecular , Genómica/estadística & datos numéricos , Humanos , Modelos Genéticos , Modelos Estadísticos , Filogenia
6.
Database (Oxford) ; 2015: bav043, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25957950

RESUMEN

Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo


Asunto(s)
Ontologías Biológicas , Curaduría de Datos/normas , Congresos como Asunto
7.
Curr Biol ; 25(10): 1347-53, 2015 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-25866392

RESUMEN

The interrelationships of the flatworms (phylum Platyhelminthes) are poorly resolved despite decades of morphological and molecular phylogenetic studies. The earliest-branching clades (Catenulida, Macrostomorpha, and Polycladida) share spiral cleavage and entolecithal eggs with other lophotrochozoans. Lecithoepitheliata have primitive spiral cleavage but derived ectolecithal eggs. Other orders (Rhabdocoela, Proseriata, Tricladida and relatives, and Bothrioplanida) all have derived ectolecithal eggs but have uncertain affinities to one another. The orders of parasitic Neodermata emerge from an uncertain position from within these ectolecithal classes. To tackle these problems, we have sequenced transcriptomes from 18 flatworms and 5 other metazoan groups. The addition of published data produces an alignment of >107,000 amino acids with less than 28% missing data from 27 flatworm taxa in 11 orders covering all major clades. Our phylogenetic analyses show that Platyhelminthes consist of the two clades Catenulida and Rhabditophora. Within Rhabditophora, we show the earliest-emerging branch is Macrostomorpha, not Polycladida. We show Lecithoepitheliata are not members of Neoophora but are sister group of Polycladida, implying independent origins of the ectolecithal eggs found in Lecithoepitheliata and Neoophora. We resolve Rhabdocoela as the most basally branching euneoophoran taxon. Tricladida, Bothrioplanida, and Neodermata constitute a group that appears to have lost both spiral cleavage and centrosomes. We identify Bothrioplanida as the long-sought closest free-living sister group of the parasitic Neodermata. Among parasitic orders, we show that Cestoda are closer to Trematoda than to Monogenea, rejecting the concept of the Cercomeromorpha. Our results have important implications for understanding the evolution of this major phylum.


Asunto(s)
Filogenia , Platelmintos/genética , Animales , Evolución Biológica , Centrómero/genética , Femenino , Perfilación de la Expresión Génica , Masculino , Óvulo/fisiología , Planarias/genética , Platelmintos/fisiología , Espermatozoides/fisiología , Terminología como Asunto
8.
PLoS One ; 10(2): e0114701, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25679783

RESUMEN

Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.


Asunto(s)
Genómica/métodos , Filogenia , Variación Genética , Anotación de Secuencia Molecular
9.
Nucleic Acids Res ; 43(Database issue): D240-9, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25399418

RESUMEN

The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.


Asunto(s)
Bases de Datos de Proteínas , Proteínas de Plantas/genética , Proteoma/química , Homología de Secuencia de Aminoácido , Algoritmos , Ontología de Genes , Genoma de Planta , Humanos , Internet , Proteínas de Plantas/química , Proteoma/genética , Sintenía , Triticum/genética
10.
Nat Biotechnol ; 32(12): 1250-5, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25402615

RESUMEN

The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.


Asunto(s)
Hurones/genética , Genoma , Gripe Humana/genética , Análisis de Secuencia de ADN , Animales , Secuencia de Bases , Mapeo Cromosómico , Modelos Animales de Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Gripe Humana/transmisión , Gripe Humana/virología , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Orthomyxoviridae/genética , Orthomyxoviridae/patogenicidad
12.
PLoS Comput Biol ; 9(1): e1002852, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23308060

RESUMEN

New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs-homologs separated by a speciation and a duplication event, respectively-provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ~400000 specific annotations with the estimated Precision of 90%, ~19000 of which are highly specific-e.g. "penicillin binding," "tRNA aminoacylation for protein translation," or "pathogenesis"-and are freely available at http://gorbi.irb.hr/.


Asunto(s)
Perfilación de la Expresión Génica , Filogenia , Escherichia coli/genética , Genes Bacterianos , Modelos Teóricos
13.
PLoS Comput Biol ; 8(5): e1002533, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22693439

RESUMEN

Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon-an important outcome given that >98% of all annotations are inferred without direct curation.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Anotación de Secuencia Molecular/métodos , Vocabulario Controlado , Sistemas de Administración de Bases de Datos , Reproducibilidad de los Resultados
14.
PLoS One ; 6(7): e21800, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21789182

RESUMEN

Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Anotación de Secuencia Molecular , Regulación de la Expresión Génica , Humanos , Internet , Programas Informáticos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador
15.
Brief Bioinform ; 12(6): 723-35, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21330331

RESUMEN

With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.


Asunto(s)
Biología Computacional/métodos , Anotación de Secuencia Molecular , Bases de Datos Genéticas , Genes , Vocabulario Controlado
16.
PLoS Genet ; 6(6): e1001004, 2010 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-20585573

RESUMEN

Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome--between 5% and 33%, depending on genome size--while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl-tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an "adaptome" by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea.


Asunto(s)
Células Procariotas/metabolismo , Biosíntesis de Proteínas , Archaea/genética , Bacterias/genética , Codón , Regulación de la Expresión Génica , Genoma , Modelos Genéticos , Proteínas Ribosómicas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...