Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nucleic Acids Res ; 52(D1): D513-D521, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962356

RESUMO

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.


Assuntos
Bases de Dados Genéticas , Ecossistema , Genoma , Proteoma , Genoma/genética , Filogenia , Sintenia , Internet , Ordem dos Genes/genética
2.
Nucleic Acids Res ; 49(D1): D373-D379, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33174605

RESUMO

OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.


Assuntos
Algoritmos , Bases de Dados Genéticas , Ordem dos Genes/genética , Genoma/genética , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Mapeamento Cromossômico , Evolução Molecular , Ontologia Genética , Humanos , Internet , Pandemias , Filogenia , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Especificidade da Espécie , Sintenia
3.
Genome Res ; 29(7): 1152-1163, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31235654

RESUMO

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.


Assuntos
Bases de Dados Genéticas , Genoma , Invertebrados/classificação , Software , Transcriptoma , Animais , Invertebrados/genética , Filogenia
4.
Bioinformatics ; 37(18): 2866-2873, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33787851

RESUMO

MOTIVATION: Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. RESULTS: Here, we first show that in multiple animal and plant datasets, 18-62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. AVAILABILITYAND IMPLEMENTATION: OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Animais , Alinhamento de Sequência , Proteínas/genética , Evolução Biológica , Filogenia
5.
Bioinformatics ; 36(Suppl_1): i210-i218, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657372

RESUMO

MOTIVATION: With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. RESULTS: This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. AVAILABILITY AND IMPLEMENTATION: All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Benchmarking , Proteínas , Biologia Computacional , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Filogenia , Proteínas/genética
6.
Nucleic Acids Res ; 46(D1): D477-D485, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29106550

RESUMO

The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org.


Assuntos
Evolução Biológica , Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Proteínas/genética , Sintenia , Algoritmos , Animais , Archaea/classificação , Archaea/genética , Archaea/metabolismo , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biologia Computacional/métodos , Fungos/classificação , Fungos/genética , Fungos/metabolismo , Ontologia Genética , Humanos , Internet , Filogenia , Plantas/classificação , Plantas/genética , Plantas/metabolismo , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Navegador
7.
Bioinformatics ; 34(17): i612-i619, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30423067

RESUMO

Motivation: A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. Results: This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. Availability and implementation: QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Locos de Características Quantitativas , Software , Arabidopsis/genética , Genômica , Anotação de Sequência Molecular , Oryza/genética
8.
Nat Biotechnol ; 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38383603

RESUMO

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.

9.
Ecol Evol ; 10(5): 2284-2298, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32184981

RESUMO

New genomic tools open doors to study ecology, evolution, and population genomics of wild animals. For the Barn owl species complex, a cosmopolitan nocturnal raptor, a very fragmented draft genome was assembled for the American species (Tyto furcata pratincola) (Jarvis et al. 2014). To improve the genome, we assembled de novo Illumina and Pacific Biosciences (PacBio) long reads sequences of its European counterpart (Tyto alba alba). This genome assembly of 1.219 Gbp comprises 21,509 scaffolds and results in a N50 of 4,615,526 bp. BUSCO (Universal Single-Copy Orthologs) analysis revealed an assembly completeness of 94.8% with only 1.8% of the genes missing out of 4,915 avian orthologs searched, a proportion similar to that found in the genomes of the zebra finch (Taeniopygia guttata) or the collared flycatcher (Ficedula albicollis). By mapping the reads of the female American barn owl to the male European barn owl reads, we detected several structural variants and identified 70 Mbp of the Z chromosome. The barn owl scaffolds were further mapped to the chromosomes of the zebra finch. In addition, the completeness of the European barn owl genome is demonstrated with 94 of 128 proteins missing in the chicken genome retrieved in the European barn owl transcripts. This improved genome will help future barn owl population genomic investigations.

10.
Sci Rep ; 8(1): 10872, 2018 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-30022098

RESUMO

The biological interpretation of gene lists with interesting shared properties, such as up- or down-regulation in a particular experiment, is typically accomplished using gene ontology enrichment analysis tools. Given a list of genes, a gene ontology (GO) enrichment analysis may return hundreds of statistically significant GO results in a "flat" list, which can be challenging to summarize. It can also be difficult to keep pace with rapidly expanding biological knowledge, which often results in daily changes to any of the over 47,000 gene ontologies that describe biological knowledge. GOATOOLS, a Python-based library, makes it more efficient to stay current with the latest ontologies and annotations, perform gene ontology enrichment analyses to determine over- and under-represented terms, and organize results for greater clarity and easier interpretation using a novel GOATOOLS GO grouping method. We performed functional analyses on both stochastic simulation data and real data from a published RNA-seq study to compare the enrichment results from GOATOOLS to two other popular tools: DAVID and GOstats. GOATOOLS is freely available through GitHub: https://github.com/tanghaibao/goatools .


Assuntos
Doença de Alzheimer/genética , Biomarcadores/análise , Biologia Computacional/métodos , Modelos Animais de Doenças , Regulação da Expressão Gênica no Desenvolvimento , Ontologia Genética , Software , Algoritmos , Doença de Alzheimer/patologia , Animais , Perfilação da Expressão Gênica , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA