Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 110
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Genome Biol Evol ; 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38742690

RESUMEN

Studying gene family evolution strongly benefits from insightful visualisations. However, the ever-growing number of sequenced genomes is leading to increasingly larger gene families, which challenges existing gene tree visualisations. Indeed, most of them present users with a dilemma: display complete but intractable gene trees, or collapse subtrees, thereby hiding their children's information. Here, we introduce Matreex, a new dynamic tool to scale-up the visualisation of gene families. Matreex's key idea is to use "phylogenetic" profiles, which are dense representations of gene repertoires, to minimise the information loss when collapsing subtrees. We illustrate Matreex's usefulness with three biological applications. First, we demonstrate on the MutS family the power of combining gene trees and phylogenetic profiles to delve into precise evolutionary analyses of large multi-copy gene families. Secondly, by displaying 22 intraflagellar transport gene families across 622 species cumulating 5'500 representatives, we show how Matreex can be used to automate large-scale analyses of gene presence-absence. Notably, we report for the first time the complete loss of intraflagellar transport in the myxozoan Thelohanellus kitauei. Finally, using the textbook example of visual opsins, we show Matreex's potential to create easily interpretable figures for teaching and outreach. Matreex is available from the Python Package Index (pip install matreex) with the source code and documentation available at https://github.com/DessimozLab/matreex.

2.
Sci Data ; 11(1): 268, 2024 Mar 05.
Artículo en Inglés | MEDLINE | ID: mdl-38443367
3.
Nat Ecol Evol ; 8(5): 854-855, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38528190

Asunto(s)
Humanos , Animales
4.
Nat Biotechnol ; 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38383603

RESUMEN

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.

5.
Nat Biotechnol ; 42(1): 139-147, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37081138

RESUMEN

Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation quality, especially for large datasets. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference, such as genome assembly, annotation and all-versus-all sequence comparisons, while retaining accuracy. In a benchmark encompassing a broad variety of datasets, Read2Tree is 10-100 times faster than assembly-based approaches and in most cases more accurate-the exception being when sequencing coverage is high and reference species very distant. Here, to illustrate the broad applicability of the tool, we reconstruct a yeast tree of life of 435 species spanning 590 million years of evolution. We also apply Read2Tree to >10,000 Coronaviridae samples, accurately classifying highly diverse animal samples and near-identical severe acute respiratory syndrome coronavirus 2 sequences on a single tree. The speed, accuracy and versatility of Read2Tree enable comparative genomics at scale.


Asunto(s)
Genómica , Animales , Filogenia , Análisis de Secuencia , Genómica/métodos
6.
Nucleic Acids Res ; 52(D1): D513-D521, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37962356

RESUMEN

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.


Asunto(s)
Bases de Datos Genéticas , Ecosistema , Genoma , Proteoma , Genoma/genética , Filogenia , Sintenía , Internet , Orden Génico/genética
7.
Biol Methods Protoc ; 8(1): bpad040, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38152108

RESUMEN

Evolution stands as a foundational pillar within modern biology, shaping our understanding of life. Studies related to evolution, for example constructing phylogenetic trees, are often carried out using DNA or protein sequences. These data, readily accessible from public databases, represent a treasure trove of resources that can be harnessed to create engaging activities with the public. At the heart of our project lies a collection of "stories" about evolution, each rooted in genuine scientific publications that furnish both biological context and supporting evidence. These narratives serve as the focal point of our LightOfEvolution.org website. Each story is accompanied by a dedicated "Your Turn to Play" section. Within this section, we furnish user-friendly activities and step-by-step guidelines, equipping visitors with the means to replicate analyses showcased in the highlighted publications. For example, the website OhMyGenes.org, relying on authentic scientific data, provides the capability to compute the proportion of shared genes across different species. Here, visitors can address the captivating question: "How many genes do we share with a banana?" To extend the educational reach, we have developed a series of modular activities, also related to the stories. These activities have been thoughtfully designed to be adaptable for face-to-face workshops held in classrooms or presented during public events. We aim to create stories and activities that resonate with participants, offering a tangible and enjoyable experience. By providing opportunities that reflect real-world scientific practices, we seek to offer participants valuable insights into the current workings of scientists "in the light of evolution."

8.
Genome Biol ; 24(1): 135, 2023 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-37291671

RESUMEN

BACKGROUND: In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS: Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS: These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.


Asunto(s)
Anotación de Secuencia Molecular , Proteínas , Análisis de Secuencia de Proteína , Secuencia de Aminoácidos , Anotación de Secuencia Molecular/métodos , Proteínas/química , Proteínas/clasificación , Proteoma , Análisis de Secuencia de Proteína/métodos , Eucariontes , Bacterias , Archaea
9.
Proc Natl Acad Sci U S A ; 120(19): e2305013120, 2023 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-37126713
10.
Plant Cell ; 35(7): 2635-2653, 2023 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-36972404

RESUMEN

PHYTOCHROME KINASE SUBSTRATE (PKS) proteins are involved in light-modulated changes in growth orientation. They act downstream of phytochromes to control hypocotyl gravitropism in the light and act early in phototropin signaling. Despite their importance for plant development, little is known about their molecular mode of action, except that they belong to a protein complex comprising phototropins at the plasma membrane (PM). Identifying evolutionary conservation is one approach to revealing biologically important protein motifs. Here, we show that PKS sequences are restricted to seed plants and that these proteins share 6 motifs (A to F from the N to the C terminus). Motifs A and D are also present in BIG GRAIN, while the remaining 4 are specific to PKSs. We provide evidence that motif C is S-acylated on highly conserved cysteines, which mediates the association of PKS proteins with the PM. Motif C is also required for PKS4-mediated phototropism and light-regulated hypocotyl gravitropism. Finally, our data suggest that the mode of PKS4 association with the PM is important for its biological activity. Our work, therefore, identifies conserved cysteines contributing to PM association of PKS proteins and strongly suggests that this is their site of action to modulate environmentally regulated organ positioning.


Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Fitocromo , Fitocromo/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Proteína S/metabolismo , Luz , Fototropismo , Hipocótilo , Acilación
11.
F1000Res ; 12: 936, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38434623

RESUMEN

Background: Comparative genomic analyses to delineate gene evolutionary histories inform the understanding of organismal biology by characterising gene and gene family origins, trajectories, and dynamics, as well as enabling the tracing of speciation, duplication, and loss events, and facilitating the transfer of gene functional information across species. Genomic data are available for an increasing number of species from the genus Drosophila, however, a dedicated resource exploiting these data to provide the research community with browsable results from genus-wide orthology delineation has been lacking. Methods: Using the OMA Orthologous Matrix orthology inference approach and browser deployment framework, we catalogued orthologues across a selected set of Drosophila species with high-quality annotated genomes. We developed and deployed a dedicated instance of the OMA browser to facilitate intuitive exploration, visualisation, and downloading of the genus-wide orthology delineation results. Results: DrosOMA - the Drosophila Orthologous Matrix browser, accessible from https://drosoma.dcsr.unil.ch/ - presents the results of orthology delineation for 36 drosophilids from across the genus and four outgroup dipterans. It enables querying and browsing of the orthology data through a feature-rich web interface, with gene-view, orthologous group-view, and genome-view pages, including comprehensive gene name and identifier cross-references together with available functional annotations and protein domain architectures, as well as tools to visualise local and global synteny conservation. Conclusions: The DrosOMA browser demonstrates the deployability of the OMA browser framework for building user-friendly orthology databases with dense sampling of a selected taxonomic group. It provides the Drosophila research community with a tailored resource of browsable results from genus-wide orthology delineation.


Asunto(s)
Drosophila , Evolución Molecular , Animales , Drosophila/genética , Hibridación Genómica Comparativa , Bases de Datos Factuales , Genómica
12.
bioRxiv ; 2022 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-36561179

RESUMEN

The inference of phylogenetic trees is foundational to biology. However, state-of-the-art phylogenomics requires running complex pipelines, at significant computational and labour costs, with additional constraints in sequencing coverage, assembly and annotation quality. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes. In a benchmark encompassing a broad variety of datasets, our assembly-free approach was 10-100x faster than conventional approaches, and in most cases more accurate-the exception being when sequencing coverage was high and reference species very distant. To illustrate the broad applicability of the tool, we reconstructed a yeast tree of life of 435 species spanning 590 million years of evolution. Applied to Coronaviridae samples, Read2Tree accurately classified highly diverse animal samples and near-identical SARS-CoV-2 sequences on a single tree-thereby exhibiting remarkable breadth and depth. The speed, accuracy, and versatility of Read2Tree enables comparative genomics at scale.

13.
Distrib Parallel Databases ; 40(2-3): 409-440, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36097541

RESUMEN

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official bioinformatics Question Answering over Linked Data (QALD) challenge, as well as the CORDIS dataset of European projects, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets. Finally, we introduce Bio-SODA UX, a graphical user interface designed to assist users in the exploration of large knowledge graphs and in dynamically disambiguating natural language questions that target the data available in these graphs.

14.
Nat Commun ; 13(1): 3880, 2022 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-35794124

RESUMEN

Sexual reproduction consists of genome reduction by meiosis and subsequent gamete fusion. The presence of genes homologous to eukaryotic meiotic genes in archaea and bacteria suggests that DNA repair mechanisms evolved towards meiotic recombination. However, fusogenic proteins resembling those found in gamete fusion in eukaryotes have so far not been found in prokaryotes. Here, we identify archaeal proteins that are homologs of fusexins, a superfamily of fusogens that mediate eukaryotic gamete and somatic cell fusion, as well as virus entry. The crystal structure of a trimeric archaeal fusexin (Fusexin1 or Fsx1) reveals an archetypical fusexin architecture with unique features such as a six-helix bundle and an additional globular domain. Ectopically expressed Fusexin1 can fuse mammalian cells, and this process involves the additional globular domain and a conserved fusion loop. Furthermore, archaeal fusexin genes are found within integrated mobile elements, suggesting potential roles in cell-cell fusion and gene exchange in archaea, as well as different scenarios for the evolutionary history of fusexins.


Asunto(s)
Archaea , Eucariontes , Animales , Archaea/genética , Fusión Celular , Eucariontes/genética , Células Eucariotas , Células Germinativas/metabolismo , Mamíferos
15.
Bioinformatics ; 38(10): 2965-2966, 2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35561194

RESUMEN

SUMMARY: The conservation of pathways and genes across species has allowed scientists to use non-human model organisms to gain a deeper understanding of human biology. However, the use of traditional model systems such as mice, rats and zebrafish is costly, time-consuming and increasingly raises ethical concerns, which highlights the need to search for less complex model organisms. Existing tools only focus on the few well-studied model systems, most of which are complex animals. To address these issues, we have developed Orthologous Matrix and Alternative Model Organism (OMAMO), a software and a web service that provides the user with the best non-complex organism for research into a biological process of interest based on orthologous relationships between human and the species. The outputs provided by OMAMO were supported by a systematic literature review. AVAILABILITY AND IMPLEMENTATION: https://omabrowser.org/omamo/, https://github.com/DessimozLab/omamo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Pez Cebra , Animales , Ratones , Ratas , Pez Cebra/genética
16.
Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35552456

RESUMEN

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.


Asunto(s)
Benchmarking , Genómica , Filogenia , Genómica/métodos , Proteoma
17.
Syst Biol ; 71(6): 1391-1403, 2022 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-35426933

RESUMEN

A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, for example, species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, and xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion, and label substitution. We show that this new Labeled Robinson-Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference and conclude that denser taxon sampling yields trees with better topology but worse labeling. [Algorithms; combinatorics; gene trees; phylogenetics; Robinson-Foulds; tree distance.].


Asunto(s)
Algoritmos , Transferencia de Gen Horizontal , Filogenia
18.
Mol Biol Evol ; 39(2)2022 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-34730808

RESUMEN

Protein posttranslational modifications add great sophistication to biological systems. Citrullination, a key regulatory mechanism in human physiology and pathophysiology, is enigmatic from an evolutionary perspective. Although the citrullinating enzymes peptidylarginine deiminases (PADIs) are ubiquitous across vertebrates, they are absent from yeast, worms, and flies. Based on this distribution PADIs were proposed to have been horizontally transferred, but this has been contested. Here, we map the evolutionary trajectory of PADIs into the animal lineage. We present strong phylogenetic support for a clade encompassing animal and cyanobacterial PADIs that excludes fungal and other bacterial homologs. The animal and cyanobacterial PADI proteins share functionally relevant primary and tertiary synapomorphic sequences that are distinct from a second PADI type present in fungi and actinobacteria. Molecular clock calculations and sequence divergence analyses using the fossil record estimate the last common ancestor of the cyanobacterial and animal PADIs to be less than 1 billion years old. Additionally, under an assumption of vertical descent, PADI sequence change during this evolutionary time frame is anachronistically low, even when compared with products of likely endosymbiont gene transfer, mitochondrial proteins, and some of the most highly conserved sequences in life. The consilience of evidence indicates that PADIs were introduced from cyanobacteria into animals by horizontal gene transfer (HGT). The ancestral cyanobacterial PADI is enzymatically active and can citrullinate eukaryotic proteins, suggesting that the PADI HGT event introduced a new catalytic capability into the regulatory repertoire of animals. This study reveals the unusual evolution of a pleiotropic protein modification.


Asunto(s)
Cianobacterias , Transferencia de Gen Horizontal , Animales , Citrulinación , Secuencia Conservada , Cianobacterias/genética , Evolución Molecular , Filogenia
19.
Bioinformatics ; 37(Suppl_1): i7-i8, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252970
20.
Genome Biol Evol ; 13(6)2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-33871639

RESUMEN

Homoeologs are pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization. Bioinformatic methods for accurate homoeology inference are crucial for studying the evolutionary consequences of polyploidization, and homoeology is typically inferred on the basis of bidirectional best hit (BBH) and/or positional conservation (synteny). However, these methods neglect the fact that genes can duplicate and move, both prior to and after the allopolyploidization event. These duplications and movements can result in many-to-many and/or nonsyntenic homoeologs-which thus remain undetected and unstudied. Here, using the allotetraploid upland cotton (Gossypium hirsutum) as a case study, we show that conventional approaches indeed miss a substantial proportion of homoeologs. Additionally, we found that many of the missed pairs of homoeologs are broadly and highly expressed. A gene ontology analysis revealed a high proportion of the nonsyntenic and non-BBH homoeologs to be involved in protein translation and are likely to contribute to the functional repertoire of cotton. Thus, from an evolutionary and functional genomics standpoint, choosing a homoeolog inference method which does not solely rely on 1:1 relationship cardinality or synteny is crucial for not missing these potentially important homoeolog pairs.


Asunto(s)
Gossypium/genética , Sintenía , Biosíntesis de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...