Pesquisa | Portal de Pesquisa da BVS

1.

Protein S-acylation controls the subcellular localization and biological activity of PHYTOCHROME KINASE SUBSTRATE.

Lopez Vazquez, Ana; Allenbach Petrolati, Laure; Legris, Martina; Dessimoz, Christophe; Lampugnani, Edwin R; Glover, Natasha; Fankhauser, Christian.

Plant Cell ; 35(7): 2635-2653, 2023 06 26.

Artigo em Inglês | MEDLINE | ID: mdl-36972404

RESUMO

PHYTOCHROME KINASE SUBSTRATE (PKS) proteins are involved in light-modulated changes in growth orientation. They act downstream of phytochromes to control hypocotyl gravitropism in the light and act early in phototropin signaling. Despite their importance for plant development, little is known about their molecular mode of action, except that they belong to a protein complex comprising phototropins at the plasma membrane (PM). Identifying evolutionary conservation is one approach to revealing biologically important protein motifs. Here, we show that PKS sequences are restricted to seed plants and that these proteins share 6 motifs (A to F from the N to the C terminus). Motifs A and D are also present in BIG GRAIN, while the remaining 4 are specific to PKSs. We provide evidence that motif C is S-acylated on highly conserved cysteines, which mediates the association of PKS proteins with the PM. Motif C is also required for PKS4-mediated phototropism and light-regulated hypocotyl gravitropism. Finally, our data suggest that the mode of PKS4 association with the PM is important for its biological activity. Our work, therefore, identifies conserved cysteines contributing to PM association of PKS proteins and strongly suggests that this is their site of action to modulate environmentally regulated organ positioning.

Assuntos

Proteínas de Arabidopsis , Arabidopsis , Fitocromo , Fitocromo/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Proteína S/metabolismo , Luz , Fototropismo , Hipocótilo , Acilação

2.

OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem.

Altenhoff, Adrian M; Warwick Vesztrocy, Alex; Bernard, Charles; Train, Clement-Marie; Nicheperovich, Alina; Prieto Baños, Silvia; Julca, Irene; Moi, David; Nevers, Yannis; Majidian, Sina; Dessimoz, Christophe; Glover, Natasha M.

Nucleic Acids Res ; 52(D1): D513-D521, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37962356

RESUMO

In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.

Assuntos

Bases de Dados Genéticas , Ecossistema , Genoma , Proteoma , Genoma/genética , Filogenia , Sintenia , Internet , Ordem dos Genes/genética

3.

The Quest for Orthologs orthology benchmark service in 2022.

Nevers, Yannis; Jones, Tamsin E M; Jyothi, Dushyanth; Yates, Bethan; Ferret, Meritxell; Portell-Silva, Laura; Codo, Laia; Cosentino, Salvatore; Marcet-Houben, Marina; Vlasova, Anna; Poidevin, Laetitia; Kress, Arnaud; Hickman, Mark; Persson, Emma; Pilizota, Ivana; Guijarro-Clarke, Cristina; Iwasaki, Wataru; Lecompte, Odile; Sonnhammer, Erik; Roos, David S; Gabaldón, Toni; Thybert, David; Thomas, Paul D; Hu, Yanhui; Emms, David M; Bruford, Elspeth; Capella-Gutierrez, Salvador; Martin, Maria J; Dessimoz, Christophe; Altenhoff, Adrian.

Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35552456

RESUMO

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.

Assuntos

Benchmarking , Genômica , Filogenia , Genômica/métodos , Proteoma

4.

Citrullination Was Introduced into Animals by Horizontal Gene Transfer from Cyanobacteria.

Cummings, Thomas F M; Gori, Kevin; Sanchez-Pulido, Luis; Gavriilidis, Gavriil; Moi, David; Wilson, Abigail R; Murchison, Elizabeth; Dessimoz, Christophe; Ponting, Chris P; Christophorou, Maria A.

Mol Biol Evol ; 39(2)2022 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-34730808

RESUMO

Protein posttranslational modifications add great sophistication to biological systems. Citrullination, a key regulatory mechanism in human physiology and pathophysiology, is enigmatic from an evolutionary perspective. Although the citrullinating enzymes peptidylarginine deiminases (PADIs) are ubiquitous across vertebrates, they are absent from yeast, worms, and flies. Based on this distribution PADIs were proposed to have been horizontally transferred, but this has been contested. Here, we map the evolutionary trajectory of PADIs into the animal lineage. We present strong phylogenetic support for a clade encompassing animal and cyanobacterial PADIs that excludes fungal and other bacterial homologs. The animal and cyanobacterial PADI proteins share functionally relevant primary and tertiary synapomorphic sequences that are distinct from a second PADI type present in fungi and actinobacteria. Molecular clock calculations and sequence divergence analyses using the fossil record estimate the last common ancestor of the cyanobacterial and animal PADIs to be less than 1 billion years old. Additionally, under an assumption of vertical descent, PADI sequence change during this evolutionary time frame is anachronistically low, even when compared with products of likely endosymbiont gene transfer, mitochondrial proteins, and some of the most highly conserved sequences in life. The consilience of evidence indicates that PADIs were introduced from cyanobacteria into animals by horizontal gene transfer (HGT). The ancestral cyanobacterial PADI is enzymatically active and can citrullinate eukaryotic proteins, suggesting that the PADI HGT event introduced a new catalytic capability into the regulatory repertoire of animals. This study reveals the unusual evolution of a pleiotropic protein modification.

Assuntos

Cianobactérias , Transferência Genética Horizontal , Animais , Citrulinação , Sequência Conservada , Cianobactérias/genética , Evolução Molecular , Filogenia

5.

OMAMO: orthology-based alternative model organism selection.

Nicheperovich, Alina; Altenhoff, Adrian M; Dessimoz, Christophe; Majidian, Sina.

Bioinformatics ; 38(10): 2965-2966, 2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35561194

RESUMO

SUMMARY: The conservation of pathways and genes across species has allowed scientists to use non-human model organisms to gain a deeper understanding of human biology. However, the use of traditional model systems such as mice, rats and zebrafish is costly, time-consuming and increasingly raises ethical concerns, which highlights the need to search for less complex model organisms. Existing tools only focus on the few well-studied model systems, most of which are complex animals. To address these issues, we have developed Orthologous Matrix and Alternative Model Organism (OMAMO), a software and a web service that provides the user with the best non-complex organism for research into a biological process of interest based on orthologous relationships between human and the species. The outputs provided by OMAMO were supported by a systematic literature review. AVAILABILITY AND IMPLEMENTATION: https://omabrowser.org/omamo/, https://github.com/DessimozLab/omamo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Peixe-Zebra , Animais , Camundongos , Ratos , Peixe-Zebra/genética

6.

A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem.

Briand, Samuel; Dessimoz, Christophe; El-Mabrouk, Nadia; Nevers, Yannis.

Syst Biol ; 71(6): 1391-1403, 2022 10 12.

Artigo em Inglês | MEDLINE | ID: mdl-35426933

RESUMO

A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, for example, species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, and xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion, and label substitution. We show that this new Labeled Robinson-Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference and conclude that denser taxon sampling yields trees with better topology but worse labeling. [Algorithms; combinatorics; gene trees; phylogenetics; Robinson-Foulds; tree distance.].

Assuntos

Algoritmos , Transferência Genética Horizontal , Filogenia

7.

OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more.

Altenhoff, Adrian M; Train, Clément-Marie; Gilbert, Kimberly J; Mediratta, Ishita; Mendes de Farias, Tarcisio; Moi, David; Nevers, Yannis; Radoykova, Hale-Seda; Rossier, Victor; Warwick Vesztrocy, Alex; Glover, Natasha M; Dessimoz, Christophe.

Nucleic Acids Res ; 49(D1): D373-D379, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33174605

RESUMO

OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.

Assuntos

Algoritmos , Bases de Dados Genéticas , Ordem dos Genes/genética , Genoma/genética , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Mapeamento Cromossômico , Evolução Molecular , Ontologia Genética , Humanos , Internet , Pandemias , Filogenia , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Especificidade da Espécie , Sintenia

8.

Ten Years of Collaborative Progress in the Quest for Orthologs.

Linard, Benjamin; Ebersberger, Ingo; McGlynn, Shawn E; Glover, Natasha; Mochizuki, Tomohiro; Patricio, Mateus; Lecompte, Odile; Nevers, Yannis; Thomas, Paul D; Gabaldón, Toni; Sonnhammer, Erik; Dessimoz, Christophe; Uchiyama, Ikuo.

Mol Biol Evol ; 38(8): 3033-3045, 2021 07 29.

Artigo em Inglês | MEDLINE | ID: mdl-33822172

RESUMO

Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.

Assuntos

Especiação Genética , Genômica/tendências , Filogenia , Genoma Viral , Genômica/métodos

9.

OMA standalone: orthology inference among public and custom genomes and transcriptomes.

Altenhoff, Adrian M; Levy, Jeremy; Zarowiecki, Magdalena; Tomiczek, Bartlomiej; Warwick Vesztrocy, Alex; Dalquen, Daniel A; Müller, Steven; Telford, Maximilian J; Glover, Natasha M; Dylus, David; Dessimoz, Christophe.

Genome Res ; 29(7): 1152-1163, 2019 07.

Artigo em Inglês | MEDLINE | ID: mdl-31235654

RESUMO

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.

Assuntos

Bases de Dados Genéticas , Genoma , Invertebrados/classificação , Software , Transcriptoma , Animais , Invertebrados/genética , Filogenia

10.

OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches.

Rossier, Victor; Warwick Vesztrocy, Alex; Robinson-Rechavi, Marc; Dessimoz, Christophe.

Bioinformatics ; 37(18): 2866-2873, 2021 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-33787851

RESUMO

MOTIVATION: Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. RESULTS: Here, we first show that in multiple animal and plant datasets, 18-62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. AVAILABILITYAND IMPLEMENTATION: OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Animais , Alinhamento de Sequência , Proteínas/genética , Evolução Biológica , Filogenia

11.

Want to track pandemic variants faster? Fix the bioinformatics bottleneck.

Hodcroft, Emma B; De Maio, Nicola; Lanfear, Rob; MacCannell, Duncan R; Minh, Bui Quang; Schmidt, Heiko A; Stamatakis, Alexandros; Goldman, Nick; Dessimoz, Christophe.

Nature ; 591(7848): 30-33, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33649511

Assuntos

COVID-19/epidemiologia , COVID-19/virologia , Evolução Molecular , Genômica/métodos , Genômica/tendências , Mutação , SARS-CoV-2/genética , Animais , Automação/métodos , Número Básico de Reprodução , COVID-19/imunologia , COVID-19/transmissão , Vacinas contra COVID-19/imunologia , Genoma Viral/genética , Humanos , Vison/virologia , Pandemias/estatística & dados numéricos , Filogenia , Saúde Pública/métodos , Saúde Pública/tendências , SARS-CoV-2/imunologia , SARS-CoV-2/isolamento & purificação , SARS-CoV-2/patogenicidade , Mídias Sociais , Incerteza

12.

The Quest for Orthologs benchmark service and consensus calls in 2020.

Altenhoff, Adrian M; Garrayo-Ventas, Javier; Cosentino, Salvatore; Emms, David; Glover, Natasha M; Hernández-Plaza, Ana; Nevers, Yannis; Sundesha, Vicky; Szklarczyk, Damian; Fernández, José M; Codó, Laia; For Orthologs Consortium, The Quest; Gelpi, Josep Ll; Huerta-Cepas, Jaime; Iwasaki, Wataru; Kelly, Steven; Lecompte, Odile; Muffato, Matthieu; Martin, Maria J; Capella-Gutierrez, Salvador; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nucleic Acids Res ; 48(W1): W538-W545, 2020 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-32374845

RESUMO

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.

Assuntos

Família Multigênica , Proteoma , Software , Animais , Benchmarking , Consenso , Genômica , Humanos , Camundongos , Filogenia , Ratos

13.

Bio-SODA UX: enabling natural language question answering over knowledge graphs with user disambiguation.

Sima, Ana Claudia; Mendes de Farias, Tarcisio; Anisimova, Maria; Dessimoz, Christophe; Robinson-Rechavi, Marc; Zbinden, Erich; Stockinger, Kurt.

Distrib Parallel Databases ; 40(2-3): 409-440, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36097541

RESUMO

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official bioinformatics Question Answering over Linked Data (QALD) challenge, as well as the CORDIS dataset of European projects, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets. Finally, we introduce Bio-SODA UX, a graphical user interface designed to assist users in the exploration of large knowledge graphs and in dynamically disambiguating natural language questions that target the data available in these graphs.

14.

Benchmarking gene ontology function predictions using negative annotations.

Warwick Vesztrocy, Alex; Dessimoz, Christophe.

Bioinformatics ; 36(Suppl_1): i210-i218, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32657372

RESUMO

MOTIVATION: With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. RESULTS: This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. AVAILABILITY AND IMPLEMENTATION: All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Benchmarking , Proteínas , Biologia Computacional , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Filogenia , Proteínas/genética

15.

Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes.

Moi, David; Kilchoer, Laurent; Aguilar, Pablo S; Dessimoz, Christophe.

PLoS Comput Biol ; 16(7): e1007553, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32697802

RESUMO

Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf.

Assuntos

Biologia Computacional/métodos , Eucariotos , Filogenia , Reprodução/genética , Análise por Conglomerados , Eucariotos/classificação , Eucariotos/genética , Cinetocoros/metabolismo , Modelos Estatísticos

16.

Phylogenetic profiling in eukaryotes comes of age.

Moi, David; Dessimoz, Christophe.

Proc Natl Acad Sci U S A ; 120(19): e2305013120, 2023 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-37126713

Assuntos

Eucariotos , Evolução Molecular , Humanos , Filogenia

17.

A generalized Robinson-Foulds distance for labeled trees.

Briand, Samuel; Dessimoz, Christophe; El-Mabrouk, Nadia; Lafond, Manuel; Lobinska, Gabriela.

BMC Genomics ; 21(Suppl 10): 779, 2020 Nov 18.

Artigo em Inglês | MEDLINE | ID: mdl-33208096

RESUMO

BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). RESULTS: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. CONCLUSIONS: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .

Assuntos

Algoritmos , Filogenia

18.

Advances and Applications in the Quest for Orthologs.

Glover, Natasha; Dessimoz, Christophe; Ebersberger, Ingo; Forslund, Sofia K; Gabaldón, Toni; Huerta-Cepas, Jaime; Martin, Maria-Jesus; Muffato, Matthieu; Patricio, Mateus; Pereira, Cécile; da Silva, Alan Sousa; Wang, Yan; Sonnhammer, Erik; Thomas, Paul D.

Mol Biol Evol ; 36(10): 2157-2164, 2019 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-31241141

RESUMO

Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.

Assuntos

Evolução Molecular , Genômica/tendências , Família Multigênica , Algoritmos , Animais , Genômica/métodos , Humanos

19.

Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome.

Pilizota, Ivana; Train, Clément-Marie; Altenhoff, Adrian; Redestig, Henning; Dessimoz, Christophe.

Bioinformatics ; 35(7): 1159-1166, 2019 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-30184069

RESUMO

MOTIVATION: As the time and cost of sequencing decrease, the number of available genomes and transcriptomes rapidly increases. Yet the quality of the assemblies and the gene annotations varies considerably and often remains poor, affecting downstream analyses. This is particularly true when fragments of the same gene are annotated as distinct genes, which may cause them to be mistaken as paralogs. RESULTS: In this study, we introduce two novel phylogenetic tests to infer non-overlapping or partially overlapping genes that are in fact parts of the same gene. One approach collapses branches with low bootstrap support and the other computes a likelihood ratio test. We extensively validated these methods by (i) introducing and recovering fragmentation on the bread wheat, Triticum aestivum cv. Chinese Spring, chromosome 3B; (ii) by applying the methods to the low-quality 3B assembly and validating predictions against the high-quality 3B assembly; and (iii) by comparing the performance of the proposed methods to the performance of existing methods, namely Ensembl Compara and ESPRIT. Application of this combination to a draft shotgun assembly of the entire bread wheat genome revealed 1221 pairs of genes that are highly likely to be fragments of the same gene. Our approach demonstrates the power of fine-grained evolutionary inferences across multiple species to improving genome assemblies and annotations. AVAILABILITY AND IMPLEMENTATION: An open source software tool is available at https://github.com/DessimozLab/esprit2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Triticum , Genoma de Planta , Anotação de Sequência Molecular , Filogenia , Software

20.

iHam and pyHam: visualizing and processing hierarchical orthologous groups.

Train, Clément-Marie; Pignatelli, Miguel; Altenhoff, Adrian; Dessimoz, Christophe.

Bioinformatics ; 35(14): 2504-2506, 2019 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-30508066

RESUMO

SUMMARY: The evolutionary history of gene families can be complex due to duplications and losses. This complexity is compounded by the large number of genomes simultaneously considered in contemporary comparative genomic analyses. As provided by several orthology databases, hierarchical orthologous groups (HOGs) are sets of genes that are inferred to have descended from a common ancestral gene within a species clade. This implies that the set of HOGs defined for a particular clade correspond to the ancestral genes found in its last common ancestor. Furthermore, by keeping track of HOG composition along the species tree, it is possible to infer the emergence, duplications and losses of genes within a gene family of interest. However, the lack of tools to manipulate and analyse HOGs has made it difficult to extract, display and interpret this type of information. To address this, we introduce interactive HOG analysis method, an interactive JavaScript widget to visualize and explore gene family history encoded in HOGs and python HOG analysis method, a python library for programmatic processing of genes families. These complementary open source tools greatly ease adoption of HOGs as a scalable and interpretable concept to relate genes across multiple species. AVAILABILITY AND IMPLEMENTATION: iHam's code is available at https://github.com/DessimozLab/iHam or can be loaded dynamically. pyHam's code is available at https://github.com/DessimozLab/pyHam and or via the pip package 'pyham'.

Assuntos

Software , Evolução Biológica , Genoma

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA