Pesquisa | Secretaria de Estado da Saúde

1.

Analysis of the Human Kinome and Phosphatome by Mass Cytometry Reveals Overexpression-Induced Effects on Cancer-Related Signaling.

Lun, Xiao-Kang; Szklarczyk, Damian; Gábor, Attila; Dobberstein, Nadine; Zanotelli, Vito Riccardo Tomaso; Saez-Rodriguez, Julio; von Mering, Christian; Bodenmiller, Bernd.

Mol Cell ; 74(5): 1086-1102.e5, 2019 06 06.

Artigo em Inglês | MEDLINE | ID: mdl-31101498

RESUMO

Kinase and phosphatase overexpression drives tumorigenesis and drug resistance. We previously developed a mass-cytometry-based single-cell proteomics approach that enables quantitative assessment of overexpression effects on cell signaling. Here, we applied this approach in a human kinome- and phosphatome-wide study to assess how 649 individually overexpressed proteins modulated cancer-related signaling in HEK293T cells in an abundance-dependent manner. Based on these data, we expanded the functional classification of human kinases and phosphatases and showed that the overexpression effects include non-catalytic roles. We detected 208 previously unreported signaling relationships. The signaling dynamics analysis indicated that the overexpression of ERK-specific phosphatases sustains proliferative signaling. This suggests a phosphatase-driven mechanism of cancer progression. Moreover, our analysis revealed a drug-resistant mechanism through which overexpression of tyrosine kinases, including SRC, FES, YES1, and BLK, induced MEK-independent ERK activation in melanoma A375 cells. These proteins could predict drug sensitivity to BRAF-MEK concurrent inhibition in cells carrying BRAF mutations.

Assuntos

Carcinogênese/genética , Melanoma/genética , Monoéster Fosfórico Hidrolases/genética , Fosfotransferases/genética , Proteínas Proto-Oncogênicas B-raf/genética , Proliferação de Células/genética , Resistencia a Medicamentos Antineoplásicos/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Células HEK293 , Humanos , Melanoma/enzimologia , Melanoma/patologia , Mutação , Fosforilação/genética , Inibidores de Proteínas Quinases/farmacologia , Proteômica , Transdução de Sinais/efeitos dos fármacos

2.

PaxDb 5.0: Curated Protein Quantification Data Suggests Adaptive Proteome Changes in Yeasts.

Huang, Qingyao; Szklarczyk, Damian; Wang, Mingcong; Simonovic, Milan; von Mering, Christian.

Mol Cell Proteomics ; 22(10): 100640, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37659604

RESUMO

The "Protein Abundances Across Organisms" database (PaxDb) is an integrative metaresource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts and expresses abundance values for each protein in "parts per million" in relation to all other protein molecules in the cell. The uniform data reprocessing, quality scoring, and integrated orthology relations have made PaxDb one of the preferred tools for comparisons between individual datasets, tissues, or organisms. In describing the latest version 5.0 of PaxDb, we particularly emphasize the data integration from various types of raw data and how we expanded the number of organisms and tissue groups as well as the proteome coverage. The current collection of PaxDb includes 831 original datasets from 170 species, including 22 Archaea, 81 Bacteria, and 67 Eukaryota. Apart from detailing the data update, we also present a comparative analysis of the human proteome subset of PaxDb against the two most widely used human proteome data resources: Human Protein Atlas and Genotype-Tissue Expression. Lastly, through our protein abundance data, we reveal an evolutionary trend in the usage of sulfur-containing amino acids in the proteomes of Fungi.

3.

eggNOG 6.0: enabling comparative genomics across 12 535 organisms.

Hernández-Plaza, Ana; Szklarczyk, Damian; Botas, Jorge; Cantalapiedra, Carlos P; Giner-Lamia, Joaquín; Mende, Daniel R; Kirsch, Rebecca; Rattei, Thomas; Letunic, Ivica; Jensen, Lars J; Bork, Peer; von Mering, Christian; Huerta-Cepas, Jaime.

Nucleic Acids Res ; 51(D1): D389-D394, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36399505

RESUMO

The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 535 reference species, expands functional annotations, and implements new functionality. In total, eggNOG 6.0 provides a hierarchy of over 17M orthologous groups (OGs) computed at 1601 taxonomic levels, spanning 10 756 bacterial, 457 archaeal and 1322 eukaryotic organisms. OGs have been thoroughly annotated using recent knowledge from functional databases, including KEGG, Gene Ontology, UniProtKB, BiGG, CAZy, CARD, PFAM and SMART. eggNOG also offers phylogenetic trees for all OGs, maximising utility and versatility for end users while allowing researchers to investigate the evolutionary history of speciation and duplication events as well as the phylogenetic distribution of functional terms within each OG. Furthermore, the eggNOG 6.0 website contains new functionality to mine orthology and functional data with ease, including the possibility of generating phylogenetic profiles for multiple OGs across species or identifying single-copy OGs at custom taxonomic levels. eggNOG 6.0 is available at http://eggnog6.embl.de.

Assuntos

Bases de Dados Genéticas , Genômica , Filogenia , Biologia Computacional , Eucariotos/genética

4.

The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.

Szklarczyk, Damian; Kirsch, Rebecca; Koutrouli, Mikaela; Nastou, Katerina; Mehryary, Farrokh; Hachilif, Radja; Gable, Annika L; Fang, Tao; Doncheva, Nadezhda T; Pyysalo, Sampo; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 51(D1): D638-D646, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36370105

RESUMO

Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

Assuntos

Mapeamento de Interação de Proteínas , Proteínas , Mapeamento de Interação de Proteínas/métodos , Bases de Dados de Proteínas , Proteínas/genética , Proteínas/metabolismo , Genômica , Proteômica , Interface Usuário-Computador

5.

proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes.

Fullam, Anthony; Letunic, Ivica; Schmidt, Thomas S B; Ducarmon, Quinten R; Karcher, Nicolai; Khedkar, Supriya; Kuhn, Michael; Larralde, Martin; Maistrenko, Oleksandr M; Malfertheiner, Lukas; Milanese, Alessio; Rodrigues, Joao Frederico Matias; Sanchis-López, Claudia; Schudoma, Christian; Szklarczyk, Damian; Sunagawa, Shinichi; Zeller, Georg; Huerta-Cepas, Jaime; von Mering, Christian; Bork, Peer; Mende, Daniel R.

Nucleic Acids Res ; 51(D1): D760-D766, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36408900

RESUMO

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.

Assuntos

Genoma , Células Procarióticas , Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Bactérias/classificação , Bactérias/genética

6.

Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments.

Gable, Annika L; Szklarczyk, Damian; Lyon, David; Matias Rodrigues, João F; von Mering, Christian.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-36088548

RESUMO

A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.

Assuntos

Genômica , Software , Bases de Dados Factuais , Bases de Dados Genéticas , Genômica/métodos , Humanos

7.

Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.

Doncheva, Nadezhda T; Morris, John H; Holze, Henrietta; Kirsch, Rebecca; Nastou, Katerina C; Cuesta-Astroz, Yesid; Rattei, Thomas; Szklarczyk, Damian; von Mering, Christian; Jensen, Lars J.

J Proteome Res ; 22(2): 637-646, 2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36512705

RESUMO

Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp.

Assuntos

COVID-19 , Humanos , SARS-CoV-2 , Software , Proteínas , Eucariotos

8.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.

Szklarczyk, Damian; Gable, Annika L; Nastou, Katerina C; Lyon, David; Kirsch, Rebecca; Pyysalo, Sampo; Doncheva, Nadezhda T; Legeay, Marc; Fang, Tao; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 49(D1): D605-D612, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33237311

RESUMO

Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

Assuntos

Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/genética , Interface Usuário-Computador

9.

The Quest for Orthologs benchmark service and consensus calls in 2020.

Altenhoff, Adrian M; Garrayo-Ventas, Javier; Cosentino, Salvatore; Emms, David; Glover, Natasha M; Hernández-Plaza, Ana; Nevers, Yannis; Sundesha, Vicky; Szklarczyk, Damian; Fernández, José M; Codó, Laia; For Orthologs Consortium, The Quest; Gelpi, Josep Ll; Huerta-Cepas, Jaime; Iwasaki, Wataru; Kelly, Steven; Lecompte, Odile; Muffato, Matthieu; Martin, Maria J; Capella-Gutierrez, Salvador; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nucleic Acids Res ; 48(W1): W538-W545, 2020 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-32374845

RESUMO

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.

Assuntos

Família Multigênica , Proteoma , Software , Animais , Benchmarking , Consenso , Genômica , Humanos , Camundongos , Filogenia , Ratos

10.

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Szklarczyk, Damian; Gable, Annika L; Lyon, David; Junge, Alexander; Wyder, Stefan; Huerta-Cepas, Jaime; Simonovic, Milan; Doncheva, Nadezhda T; Morris, John H; Bork, Peer; Jensen, Lars J; Mering, Christian von.

Nucleic Acids Res ; 47(D1): D607-D613, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30476243

RESUMO

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Assuntos

Genômica/métodos , Mapeamento de Interação de Proteínas/métodos , Software , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos

11.

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

Huerta-Cepas, Jaime; Szklarczyk, Damian; Heller, Davide; Hernández-Plaza, Ana; Forslund, Sofia K; Cook, Helen; Mende, Daniel R; Letunic, Ivica; Rattei, Thomas; Jensen, Lars J; von Mering, Christian; Bork, Peer.

Nucleic Acids Res ; 47(D1): D309-D314, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30418610

RESUMO

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.

Assuntos

Sequência Conservada , Bases de Dados Genéticas , Evolução Molecular , Filogenia , Homologia de Sequência , Animais , Classificação , Eucariotos/genética , Duplicação Gênica , Ontologia Genética , Genes Virais , Genoma , Humanos , Anotação de Sequência Molecular , Proteoma , Alinhamento de Sequência , Relação Estrutura-Atividade

12.

Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies.

Heller, Davide; Szklarczyk, Damian; Mering, Christian von.

BMC Bioinformatics ; 20(1): 228, 2019 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-31060495

RESUMO

BACKGROUND: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS: Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION: The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .

Assuntos

Bases de Dados de Proteínas/normas , Filogenia

13.

Standardized benchmarking in the quest for orthologs.

Altenhoff, Adrian M; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A; DeLuca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P; Schreiber, Fabian; da Silva, Alan Sousa; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Jensen, Lars Juhl; Martin, Maria J; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nat Methods ; 13(5): 425-30, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

Assuntos

Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie

14.

Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

Orlando, Ludovic; Ginolhac, Aurélien; Zhang, Guojie; Froese, Duane; Albrechtsen, Anders; Stiller, Mathias; Schubert, Mikkel; Cappellini, Enrico; Petersen, Bent; Moltke, Ida; Johnson, Philip L F; Fumagalli, Matteo; Vilstrup, Julia T; Raghavan, Maanasa; Korneliussen, Thorfinn; Malaspinas, Anna-Sapfo; Vogt, Josef; Szklarczyk, Damian; Kelstrup, Christian D; Vinther, Jakob; Dolocan, Andrei; Stenderup, Jesper; Velazquez, Amhed M V; Cahill, James; Rasmussen, Morten; Wang, Xiaoli; Min, Jiumeng; Zazula, Grant D; Seguin-Orlando, Andaine; Mortensen, Cecilie; Magnussen, Kim; Thompson, John F; Weinstock, Jacobo; Gregersen, Kristian; Røed, Knut H; Eisenmann, Véra; Rubin, Carl J; Miller, Donald C; Antczak, Douglas F; Bertelsen, Mads F; Brunak, Søren; Al-Rasheid, Khaled A S; Ryder, Oliver; Andersson, Leif; Mundy, John; Krogh, Anders; Gilbert, M Thomas P; Kjær, Kurt; Sicheritz-Ponten, Thomas; Jensen, Lars Juhl.

Nature ; 499(7456): 74-8, 2013 Jul 04.

Artigo em Inglês | MEDLINE | ID: mdl-23803765

RESUMO

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.

Assuntos

Evolução Molecular , Genoma/genética , Cavalos/genética , Filogenia , Animais , Conservação dos Recursos Naturais , DNA/análise , DNA/genética , Espécies em Perigo de Extinção , Equidae/classificação , Equidae/genética , Fósseis , Variação Genética/genética , História Antiga , Cavalos/classificação , Proteínas/análise , Proteínas/química , Proteínas/genética , Yukon

15.

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.

Szklarczyk, Damian; Morris, John H; Cook, Helen; Kuhn, Michael; Wyder, Stefan; Simonovic, Milan; Santos, Alberto; Doncheva, Nadezhda T; Roth, Alexander; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 45(D1): D362-D368, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27924014

RESUMO

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Software , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Relação Estrutura-Atividade , Interface Usuário-Computador , Navegador

16.

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper.

Huerta-Cepas, Jaime; Forslund, Kristoffer; Coelho, Luis Pedro; Szklarczyk, Damian; Jensen, Lars Juhl; von Mering, Christian; Bork, Peer.

Mol Biol Evol ; 34(8): 2115-2122, 2017 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-28460117

RESUMO

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs â¼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

Assuntos

Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Bases de Dados de Proteínas , Ontologia Genética , Genoma/genética , Filogenia , Alinhamento de Sequência/estatística & dados numéricos , Software

17.

STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data.

Szklarczyk, Damian; Santos, Alberto; von Mering, Christian; Jensen, Lars Juhl; Bork, Peer; Kuhn, Michael.

Nucleic Acids Res ; 44(D1): D380-4, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26590256

RESUMO

Interactions between proteins and small molecules are an integral part of biological processes in living organisms. Information on these interactions is dispersed over many databases, texts and prediction methods, which makes it difficult to get a comprehensive overview of the available evidence. To address this, we have developed STITCH ('Search Tool for Interacting Chemicals') that integrates these disparate data sources for 430 000 chemicals into a single, easy-to-use resource. In addition to the increased scope of the database, we have implemented a new network view that gives the user the ability to view binding affinities of chemicals in the interaction network. This enables the user to get a quick overview of the potential effects of the chemical on its interaction partners. For each organism, STITCH provides a global network; however, not all proteins have the same pattern of spatial expression. Therefore, only a certain subset of interactions can occur simultaneously. In the new, fifth release of STITCH, we have implemented functionality to filter out the proteins and chemicals not associated with a given tissue. The STITCH database can be downloaded in full, accessed programmatically via an extensive API, or searched via a redesigned web interface at http://stitch.embl.de.

Assuntos

Bases de Dados de Produtos Farmacêuticos , Descoberta de Drogas , Proteínas/metabolismo , Animais , Humanos , Especificidade de Órgãos , Ligação Proteica , Proteínas/efeitos dos fármacos

18.

WeGET: predicting new genes for molecular systems by weighted co-expression.

Szklarczyk, Radek; Megchelenbrink, Wout; Cizek, Pavel; Ledent, Marie; Velemans, Gonny; Szklarczyk, Damian; Huynen, Martijn A.

Nucleic Acids Res ; 44(D1): D567-73, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26582928

RESUMO

We have developed the Weighted Gene Expression Tool and database (WeGET, http://weget.cmbi.umcn.nl) for the prediction of new genes of a molecular system by correlated gene expression. WeGET utilizes a compendium of 465 human and 560 murine gene expression datasets that have been collected from multiple tissues under a wide range of experimental conditions. It exploits this abundance of expression data by assigning a high weight to datasets in which the known genes of a molecular system are harmoniously up- and down-regulated. WeGET ranks new candidate genes by calculating their weighted co-expression with that system. A weighted rank is calculated for human genes and their mouse orthologs. Then, an integrated gene rank and p-value is computed using a rank-order statistic. We applied our method to predict novel genes that have a high degree of co-expression with Gene Ontology terms and pathways from KEGG and Reactome. For each query set we provide a list of predicted novel genes, computed weights for transcription datasets used and cell and tissue types that contributed to the final predictions. The performance for each query set is assessed by 10-fold cross-validation. Finally, users can use the WeGET to predict novel genes that co-express with a custom query set.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Animais , Humanos , Camundongos , Neuralgia/genética , Software

19.

Correction to 'The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets'.

Szklarczyk, Damian; Gable, Annika L; Nastou, Katerina C; Lyon, David; Kirsch, Rebecca; Pyysalo, Sampo; Doncheva, Nadezhda T; Legeay, Marc; Fang, Tao; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 49(18): 10800, 2021 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-34530444

20.

eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences.

Huerta-Cepas, Jaime; Szklarczyk, Damian; Forslund, Kristoffer; Cook, Helen; Heller, Davide; Walter, Mathias C; Rattei, Thomas; Mende, Daniel R; Sunagawa, Shinichi; Kuhn, Michael; Jensen, Lars Juhl; von Mering, Christian; Bork, Peer.

Nucleic Acids Res ; 44(D1): D286-93, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26582926

RESUMO

eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.

Assuntos

Bases de Dados de Proteínas , Anotação de Sequência Molecular , Análise de Sequência de Proteína , Algoritmos , Proteínas Arqueais/química , Proteínas de Bactérias/química , Eucariotos , Filogenia , Proteoma/química , Proteínas Virais/química

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa