Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Methods Mol Biol ; 1446: 25-37, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27812933

RESUMO

The Gene Ontology (GO) project is the largest resource for cataloguing gene function. The combination of solid conceptual underpinnings and a practical set of features have made the GO a widely adopted resource in the research community and an essential resource for data analysis. In this chapter, we provide a concise primer for all users of the GO. We briefly introduce the structure of the ontology and explain how to interpret annotations associated with the GO.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Animais , DNA/genética , Bases de Dados Genéticas , Humanos , Internet , Proteínas/genética , RNA/genética
2.
Methods Mol Biol ; 1446: 97-109, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27812938

RESUMO

Two avenues to understanding gene function are complementary and often overlapping: experimental work and computational prediction. While experimental annotation generally produces high-quality annotations, it is low throughput. Conversely, computational annotations have broad coverage, but the quality of annotations may be variable, and therefore evaluating the quality of computational annotations is a critical concern.In this chapter, we provide an overview of strategies to evaluate the quality of computational annotations. First, we discuss why evaluating quality in this setting is not trivial. We highlight the various issues that threaten to bias the evaluation of computational annotations, most of which stem from the incompleteness of biological databases. Second, we discuss solutions that address these issues, for example, targeted selection of new experimental annotations and leveraging the existing experimental annotations.


Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular/métodos , Animais , Simulação por Computador , Bases de Dados Genéticas , Genoma , Humanos , Modelos Biológicos , Proteínas/genética , Proteínas/metabolismo
3.
Methods Mol Biol ; 1446: 207-220, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27812945

RESUMO

Contemporary techniques in biology produce readouts for large numbers of genes simultaneously, the typical example being differential gene expression measurements. Moreover, those genes are often richly annotated using GO terms that describe gene function and that can be used to summarize the results of the genome-scale experiments. However, making sense of such GO enrichment analyses may be challenging. For instance, overrepresented GO functions in a set of differentially expressed genes are typically output as a flat list, a format not adequate to capture the complexities of the hierarchical structure of the GO annotation labels.In this chapter, we survey various methods to visualize large, difficult-to-interpret lists of GO terms. We catalog their availability-Web-based or standalone, the main principles they employ in summarizing large lists of GO terms, and the visualization styles they support. These brief commentaries on each software are intended as a helpful inventory, rather than comprehensive descriptions of the underlying algorithms. Instead, we show examples of their use and suggest that the choice of an appropriate visualization tool may be crucial to the utility of GO in biological discovery.


Assuntos
Ontologia Genética , Animais , Recursos Audiovisuais , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Internet , Software
4.
Genome Biol ; 17(1): 184, 2016 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-27604469

RESUMO

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Assuntos
Biologia Computacional , Proteínas/química , Software , Relação Estrutura-Atividade , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Proteínas/genética
5.
PLoS Comput Biol ; 11(5): e1004095, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-26020646

RESUMO

Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.


Assuntos
Transferência Genética Horizontal , Composição de Bases , Biologia Computacional , Simulação por Computador , DNA Bacteriano/genética , Bases de Dados Genéticas , Farmacorresistência Bacteriana/genética , Evolução Molecular , Genômica/estatística & dados numéricos , Humanos , Modelos Genéticos , Modelos Estatísticos , Filogenia
6.
Database (Oxford) ; 2015: bav043, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25957950

RESUMO

Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology--the Confidence Information Ontology--to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL: https://github.com/BgeeDB/confidence-information-ontology Ontology URL: https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo


Assuntos
Ontologias Biológicas , Curadoria de Dados/normas , Congressos como Assunto
7.
Curr Biol ; 25(10): 1347-53, 2015 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-25866392

RESUMO

The interrelationships of the flatworms (phylum Platyhelminthes) are poorly resolved despite decades of morphological and molecular phylogenetic studies. The earliest-branching clades (Catenulida, Macrostomorpha, and Polycladida) share spiral cleavage and entolecithal eggs with other lophotrochozoans. Lecithoepitheliata have primitive spiral cleavage but derived ectolecithal eggs. Other orders (Rhabdocoela, Proseriata, Tricladida and relatives, and Bothrioplanida) all have derived ectolecithal eggs but have uncertain affinities to one another. The orders of parasitic Neodermata emerge from an uncertain position from within these ectolecithal classes. To tackle these problems, we have sequenced transcriptomes from 18 flatworms and 5 other metazoan groups. The addition of published data produces an alignment of >107,000 amino acids with less than 28% missing data from 27 flatworm taxa in 11 orders covering all major clades. Our phylogenetic analyses show that Platyhelminthes consist of the two clades Catenulida and Rhabditophora. Within Rhabditophora, we show the earliest-emerging branch is Macrostomorpha, not Polycladida. We show Lecithoepitheliata are not members of Neoophora but are sister group of Polycladida, implying independent origins of the ectolecithal eggs found in Lecithoepitheliata and Neoophora. We resolve Rhabdocoela as the most basally branching euneoophoran taxon. Tricladida, Bothrioplanida, and Neodermata constitute a group that appears to have lost both spiral cleavage and centrosomes. We identify Bothrioplanida as the long-sought closest free-living sister group of the parasitic Neodermata. Among parasitic orders, we show that Cestoda are closer to Trematoda than to Monogenea, rejecting the concept of the Cercomeromorpha. Our results have important implications for understanding the evolution of this major phylum.


Assuntos
Filogenia , Platelmintos/genética , Animais , Evolução Biológica , Centrômero/genética , Feminino , Perfilação da Expressão Gênica , Masculino , Óvulo/fisiologia , Planárias/genética , Platelmintos/fisiologia , Espermatozoides/fisiologia , Terminologia como Assunto
8.
PLoS One ; 10(2): e0114701, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25679783

RESUMO

Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼ 100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.


Assuntos
Genômica/métodos , Filogenia , Variação Genética , Anotação de Sequência Molecular
9.
Nucleic Acids Res ; 43(Database issue): D240-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25399418

RESUMO

The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for 'client-side' orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.org.


Assuntos
Bases de Dados de Proteínas , Proteínas de Plantas/genética , Proteoma/química , Homologia de Sequência de Aminoácidos , Algoritmos , Ontologia Genética , Genoma de Planta , Humanos , Internet , Proteínas de Plantas/química , Proteoma/genética , Sintenia , Triticum/genética
10.
Nat Biotechnol ; 32(12): 1250-5, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25402615

RESUMO

The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.


Assuntos
Furões/genética , Genoma , Influenza Humana/genética , Análise de Sequência de DNA , Animais , Sequência de Bases , Mapeamento Cromossômico , Modelos Animais de Doenças , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Influenza Humana/transmissão , Influenza Humana/virologia , Anotação de Sequência Molecular , Dados de Sequência Molecular , Orthomyxoviridae/genética , Orthomyxoviridae/patogenicidade
12.
PLoS Comput Biol ; 9(1): e1002852, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23308060

RESUMO

New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs-homologs separated by a speciation and a duplication event, respectively-provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ~400000 specific annotations with the estimated Precision of 90%, ~19000 of which are highly specific-e.g. "penicillin binding," "tRNA aminoacylation for protein translation," or "pathogenesis"-and are freely available at http://gorbi.irb.hr/.


Assuntos
Perfilação da Expressão Gênica , Filogenia , Escherichia coli/genética , Genes Bacterianos , Modelos Teóricos
13.
PLoS Comput Biol ; 8(5): e1002533, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22693439

RESUMO

Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon-an important outcome given that >98% of all annotations are inferred without direct curation.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Anotação de Sequência Molecular/métodos , Vocabulário Controlado , Sistemas de Gerenciamento de Base de Dados , Reprodutibilidade dos Testes
14.
PLoS One ; 6(7): e21800, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21789182

RESUMO

Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Anotação de Sequência Molecular , Regulação da Expressão Gênica , Humanos , Internet , Software , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Interface Usuário-Computador
15.
Brief Bioinform ; 12(6): 723-35, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21330331

RESUMO

With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.


Assuntos
Biologia Computacional/métodos , Anotação de Sequência Molecular , Bases de Dados Genéticas , Genes , Vocabulário Controlado
16.
PLoS Genet ; 6(6): e1001004, 2010 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-20585573

RESUMO

Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome--between 5% and 33%, depending on genome size--while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl-tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an "adaptome" by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea.


Assuntos
Células Procarióticas/metabolismo , Biossíntese de Proteínas , Archaea/genética , Bactérias/genética , Códon , Regulação da Expressão Gênica , Genoma , Modelos Genéticos , Proteínas Ribossômicas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA