Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nat Biotechnol ; 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38383603

RESUMO

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.

2.
Nucleic Acids Res ; 52(D1): D513-D521, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37962356

RESUMO

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.


Assuntos
Bases de Dados Genéticas , Ecossistema , Genoma , Proteoma , Genoma/genética , Filogenia , Sintenia , Internet , Ordem dos Genes/genética
3.
Genome Biol ; 24(1): 135, 2023 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-37291671

RESUMO

BACKGROUND: In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS: Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS: These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.


Assuntos
Anotação de Sequência Molecular , Proteínas , Análise de Sequência de Proteína , Sequência de Aminoácidos , Anotação de Sequência Molecular/métodos , Proteínas/química , Proteínas/classificação , Proteoma , Análise de Sequência de Proteína/métodos , Eucariotos , Bactérias , Archaea
4.
Nucleic Acids Res ; 49(D1): D373-D379, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33174605

RESUMO

OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.


Assuntos
Algoritmos , Bases de Dados Genéticas , Ordem dos Genes/genética , Genoma/genética , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Mapeamento Cromossômico , Evolução Molecular , Ontologia Genética , Humanos , Internet , Pandemias , Filogenia , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Especificidade da Espécie , Sintenia
5.
F1000Res ; 9: 665, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32676187

RESUMO

The OMA Collection is a resource for users of Orthologous Matrix. In this collection, we provide tutorials and protocols on how to leverage the tools provided by OMA to analyse your data. Here, I explain the motivation for this collection and its published works thus far.


Assuntos
Bases de Dados Genéticas , Genômica , Software
6.
PLoS One ; 15(6): e0235120, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32584851

RESUMO

Two low-phytate soybean (Glycine max (L.) Merr.) mutant lines- V99-5089 (mips mutation on chromosome 11) and CX-1834 (mrp-l and mrp-n mutations on chromosomes 19 and 3, respectively) have proven to be valuable resources for breeding of low-phytate, high-sucrose, and low-raffinosaccharide soybeans, traits that are highly desirable from a nutritional and environmental standpoint. A recombinant inbred population derived from the cross CX1834 x V99-5089 provides an opportunity to study the effect of different combinations of these three mutations on soybean phytate and oligosaccharides levels. Of the 173 recombinant inbred lines tested, 163 lines were homozygous for various combinations of MIPS and two MRP loci alleles. These individuals were grouped into eight genotypic classes based on the combination of SNP alleles at the three mutant loci. The two genotypic classes that were homozygous mrp-l/mrp-n and either homozygous wild-type or mutant at the mips locus (MIPS/mrp-l/mrp-n or mips/mrp-l/mrp-n) displayed relatively similar ~55% reductions in seed phytate, 6.94 mg g -1 and 6.70 mg g-1 respectively, as compared with 15.2 mg g-1 in the wild-type MIPS/MRP-L/MRP-N seed. Therefore, in the presence of the double mutant mrp-l/mrp-n, the mips mutation did not cause a substantially greater decrease in seed phytate level. However, the nutritionally-desirable high-sucrose/low-stachyose/low-raffinose seed phenotype originally observed in soybeans homozygous for the mips allele was reversed in the presence of mrp-l/mrp-n mutations: homozygous mips/mrp-l/mrp-n seed displayed low-sucrose (7.70%), high-stachyose (4.18%), and the highest observed raffinose (0.94%) contents per gram of dry seed. Perhaps the block in phytic acid transport from its cytoplasmic synthesis site to its storage site, conditioned by mrp-l/mrp-n, alters myo-inositol flux in mips seeds in a way that restores to wild-type levels the mips conditioned reductions in raffinosaccharides. Overall this study determined the combinatorial effects of three low phytic acid causing mutations on regulation of seed phytate and oligosaccharides in soybean.


Assuntos
Loci Gênicos , Glycine max , Mutação , Oligossacarídeos , Ácido Fítico/metabolismo , Sementes , Oligossacarídeos/genética , Oligossacarídeos/metabolismo , Sementes/genética , Sementes/metabolismo , Glycine max/genética , Glycine max/metabolismo
7.
Nucleic Acids Res ; 48(W1): W538-W545, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32374845

RESUMO

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.


Assuntos
Família Multigênica , Proteoma , Software , Animais , Benchmarking , Consenso , Genômica , Humanos , Camundongos , Filogenia , Ratos
8.
F1000Res ; 9: 27, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32089838

RESUMO

The Orthologous Matrix (OMA) is a method and database that allows users to identify orthologs among many genomes. OMA provides three different types of orthologs: pairwise orthologs, OMA Groups and Hierarchical Orthologous Groups (HOGs). This Primer is organized in two parts. In the first part, we provide all the necessary background information to understand the concepts of orthology, how we infer them and the different subtypes of orthology in OMA, as well as what types of analyses they should be used for. In the second part, we describe protocols for using the OMA browser to find a specific gene and its various types of orthologs. By the end of the Primer, readers should be able to (i) understand homology and the different types of orthologs reported in OMA, (ii) understand the best type of orthologs to use for a particular analysis; (iii) find particular genes of interest in the OMA browser; and (iv) identify orthologs for a given gene.  The data can be freely accessed from the OMA browser at https://omabrowser.org.


Assuntos
Genoma , Genômica/métodos , Software , Biologia Computacional
9.
F1000Res ; 9: 511, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35722083

RESUMO

Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools. This can be a difficult task, and orthology inference, in particular, is usually computationally intensive and error prone if done ad hoc. This tutorial provides protocols to make use of OMA Orthologous Groups, a set of genes all orthologous to each other, to infer a phylogenetic species tree. It is designed to be user-friendly and computationally inexpensive, by providing two options: (1) Using only precomputed groups with species available on the OMA Browser, or (2) Computing orthologs using OMA Standalone for additional species, with the option of using precomputed orthology relations for those present in OMA. A protocol for downstream analyses is provided as well, including creating a supermatrix, tree inference, and visualization. All protocols use publicly available software, and we provide scripts and code snippets to facilitate data handling. The protocols are accompanied with practical examples.

10.
Methods Mol Biol ; 1910: 149-175, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278664

RESUMO

The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Genômica , Filogenia , Algoritmos , Animais , Genoma , Genômica/métodos , Humanos , Família Multigênica
11.
Genome Res ; 29(7): 1152-1163, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31235654

RESUMO

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.


Assuntos
Bases de Dados Genéticas , Genoma , Invertebrados/classificação , Software , Transcriptoma , Animais , Invertebrados/genética , Filogenia
12.
PeerJ ; 6: e6231, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30648004

RESUMO

In polyploid genomes, homoeologs are a specific subtype of homologs, and can be thought of as orthologs between subgenomes. In Orthologous MAtrix, we infer homoeologs in three polyploid plant species: upland cotton (Gossypium hirsutum), rapeseed (Brassica napus), and bread wheat (Triticum aestivum). While we can typically recognize the features of a "good" homoeolog prediction (a consistent evolutionary distance, high synteny, and a one-to-one relationship), none of them is a hard-fast criterion. We devised a novel fuzzy logic-based method to assign confidence scores to each pair of predicted homoeologs. We inferred homoeolog pairs and used the new and improved method to assign confidence scores, which ranged from 0 to 100. Most confidence scores were between 70 and 100, but the distribution varied between genomes. The new confidence scores show an improvement over our previous method and were manually evaluated using a subset from various confidence ranges.

13.
Nucleic Acids Res ; 46(D1): D477-D485, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29106550

RESUMO

The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org.


Assuntos
Evolução Biológica , Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Proteínas/genética , Sintenia , Algoritmos , Animais , Archaea/classificação , Archaea/genética , Archaea/metabolismo , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biologia Computacional/métodos , Fungos/classificação , Fungos/genética , Fungos/metabolismo , Ontologia Genética , Humanos , Internet , Filogenia , Plantas/classificação , Plantas/genética , Plantas/metabolismo , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Navegador
14.
Bioinformatics ; 33(14): i75-i82, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28881964

RESUMO

MOTIVATION: Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets. With more and more genomes available, it is necessary to improve the scalability and robustness of orthology inference methods. RESULTS: We present improvements in the OMA algorithm: (i) refining the pairwise orthology inference step to account for same-species paralogs evolving at different rates, and (ii) minimizing errors in the pairwise orthology verification step by testing the consistency of pairwise distance estimates, which can be problematic in the presence of fragmentary sequences. In addition we introduce a more scalable procedure for hierarchical orthologous group (HOG) clustering, which are several orders of magnitude faster on large datasets. Using the Quest for Orthologs consortium orthology benchmark service, we show that these changes translate into substantial improvement on multiple empirical datasets. AVAILABILITY AND IMPLEMENTATION: This new OMA 2.0 algorithm is used in the OMA database ( http://omabrowser.org ) from the March 2017 release onwards, and can be run on custom genomes using OMA standalone version 2.0 and above ( http://omabrowser.org/standalone ). CONTACT: christophe.dessimoz@unil.ch or adrian.altenhoff@inf.ethz.ch.


Assuntos
Evolução Molecular , Genômica/métodos , Taxa de Mutação , Filogenia , Software , Algoritmos , Animais , Humanos , Mamíferos/genética
15.
Front Plant Sci ; 7: 610, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27242817

RESUMO

It is trite to say "publish or perish," yet many early career researchers are often at a loss on how to best get their work published. With strong competition and many manuscripts submitted, it is difficult to convince editors and reviewers to opt for acceptance. A pragmatic approach to publishing may increase one's odds of success. Here, we - a group of postdocs in the field of plant science - present specific recommendations for early career scientists on advanced levels. We cannot provide a recipe-like set of instructions with success guaranteed, but we come from a broad background in plant science, with experience publishing in a number of journals of varying topics and impact factors. We provide tips, tricks, and tools for collaboration, journal selection, and achieving acceptance.

16.
Trends Plant Sci ; 21(7): 609-621, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27021699

RESUMO

The evolutionary history of nearly all flowering plants includes a polyploidization event. Homologous genes resulting from allopolyploidy are commonly referred to as 'homoeologs', although this term has not always been used precisely or consistently in the literature. With several allopolyploid genome sequencing projects under way, there is a pressing need for computational methods for homoeology inference. Here we review the definition of homoeology in historical and modern contexts and propose a precise and testable definition highlighting the connection between homoeologs and orthologs. In the second part, we survey experimental and computational methods of homoeolog inference, considering the strengths and limitations of each approach. Establishing a precise and evolutionarily meaningful definition of homoeology is essential for understanding the evolutionary consequences of polyploidization.


Assuntos
Genes de Plantas/genética , Evolução Molecular , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Poliploidia
17.
Genome Biol ; 16: 188, 2015 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-26353816

RESUMO

BACKGROUND: Bread wheat is not only an important crop, but its large (17 Gb), highly repetitive, and hexaploid genome makes it a good model to study the organization and evolution of complex genomes. Recently, we produced a high quality reference sequence of wheat chromosome 3B (774 Mb), which provides an excellent opportunity to study the evolutionary dynamics of a large and polyploid genome, specifically the impact of single gene duplications. RESULTS: We find that 27 % of the 3B predicted genes are non-syntenic with the orthologous chromosomes of Brachypodium distachyon, Oryza sativa, and Sorghum bicolor, whereas, by applying the same criteria, non-syntenic genes represent on average only 10 % of the predicted genes in these three model grasses. These non-syntenic genes on 3B have high sequence similarity to at least one other gene in the wheat genome, indicating that hexaploid wheat has undergone massive small-scale interchromosomal gene duplications compared to other grasses. Insertions of non-syntenic genes occurred at a similar rate along the chromosome, but these genes tend to be retained at a higher frequency in the distal, recombinogenic regions. The ratio of non-synonymous to synonymous substitution rates showed a more relaxed selection pressure for non-syntenic genes compared to syntenic genes, and gene ontology analysis indicated that non-syntenic genes may be enriched in functions involved in disease resistance. CONCLUSION: Our results highlight the major impact of single gene duplications on the wheat gene complement and confirm the accelerated evolution of the Triticeae lineage among grasses.


Assuntos
Cromossomos de Plantas , Evolução Molecular , Duplicação Gênica , Genes de Plantas , Triticum/genética , Poaceae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...