Búsqueda | BVS Nicaragua

1.

Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration.

Fang, Tao; Szklarczyk, Damian; Hachilif, Radja; von Mering, Christian.

Sci Rep ; 14(1): 6009, 2024 03 12.

Artículo en Inglés | MEDLINE | ID: mdl-38472223

RESUMEN

Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.

Asunto(s)

Algoritmos , Proteínas , Alineación de Secuencia , Proteínas/genética , Evolución Biológica , Filogenia , Biología Computacional/métodos

2.

PaxDb 5.0: Curated Protein Quantification Data Suggests Adaptive Proteome Changes in Yeasts.

Huang, Qingyao; Szklarczyk, Damian; Wang, Mingcong; Simonovic, Milan; von Mering, Christian.

Mol Cell Proteomics ; 22(10): 100640, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-37659604

RESUMEN

The "Protein Abundances Across Organisms" database (PaxDb) is an integrative metaresource dedicated to protein abundance levels, in tissue-specific or whole-organism proteomes. PaxDb focuses on computing best-estimate abundances for proteins in normal/healthy contexts and expresses abundance values for each protein in "parts per million" in relation to all other protein molecules in the cell. The uniform data reprocessing, quality scoring, and integrated orthology relations have made PaxDb one of the preferred tools for comparisons between individual datasets, tissues, or organisms. In describing the latest version 5.0 of PaxDb, we particularly emphasize the data integration from various types of raw data and how we expanded the number of organisms and tissue groups as well as the proteome coverage. The current collection of PaxDb includes 831 original datasets from 170 species, including 22 Archaea, 81 Bacteria, and 67 Eukaryota. Apart from detailing the data update, we also present a comparative analysis of the human proteome subset of PaxDb against the two most widely used human proteome data resources: Human Protein Atlas and Genotype-Tissue Expression. Lastly, through our protein abundance data, we reveal an evolutionary trend in the usage of sulfur-containing amino acids in the proteomes of Fungi.

3.

CanIsoNet: a database to study the functional impact of isoform switching events in diseases.

Karakulak, Tülay; Szklarczyk, Damian; Saylan, Cemil Can; Moch, Holger; von Mering, Christian; Kahraman, Abdullah.

Bioinform Adv ; 3(1): vbad050, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37123454

RESUMEN

Motivation: Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer and other diseases. Switches in the expression of most dominant alternative isoforms can alter protein interaction networks of associated genes giving rise to disease and disease progression. Here, we present CanIsoNet, a database to view, browse and search isoform switching events in diseases. CanIsoNet is the first webserver that incorporates isoform expression data with STRING interaction networks and ClinVar annotations to predict the pathogenic impact of isoform switching events in various diseases. Results: Data in CanIsoNet can be browsed by disease or searched by genes or isoforms in annotation-rich data tables. Various annotations for 11 811 isoforms and 14 357 unique isoform switching events across 31 different disease types are available. The network density score for each disease-specific isoform, PFAM domain IDs of disrupted interactions, domain structure visualization of transcripts and expression data of switched isoforms for each sample is given. Additionally, the genes annotated in ClinVar are highlighted in interactive interaction networks. Availability and implementation: CanIsoNet is freely available at https://www.caniso.net. The source codes can be found under a Creative Common License at https://github.com/kahramanlab/CanIsoNet_Web. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

4.

Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks.

Doncheva, Nadezhda T; Morris, John H; Holze, Henrietta; Kirsch, Rebecca; Nastou, Katerina C; Cuesta-Astroz, Yesid; Rattei, Thomas; Szklarczyk, Damian; von Mering, Christian; Jensen, Lars J.

J Proteome Res ; 22(2): 637-646, 2023 02 03.

Artículo en Inglés | MEDLINE | ID: mdl-36512705

RESUMEN

Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp.

Asunto(s)

COVID-19 , Humanos , SARS-CoV-2 , Programas Informáticos , Proteínas , Eucariontes

5.

proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes.

Fullam, Anthony; Letunic, Ivica; Schmidt, Thomas S B; Ducarmon, Quinten R; Karcher, Nicolai; Khedkar, Supriya; Kuhn, Michael; Larralde, Martin; Maistrenko, Oleksandr M; Malfertheiner, Lukas; Milanese, Alessio; Rodrigues, Joao Frederico Matias; Sanchis-López, Claudia; Schudoma, Christian; Szklarczyk, Damian; Sunagawa, Shinichi; Zeller, Georg; Huerta-Cepas, Jaime; von Mering, Christian; Bork, Peer; Mende, Daniel R.

Nucleic Acids Res ; 51(D1): D760-D766, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36408900

RESUMEN

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.

Asunto(s)

Genoma , Células Procariotas , Bases de Datos Genéticas , Genómica , Anotación de Secuencia Molecular , Bacterias/clasificación , Bacterias/genética

6.

eggNOG 6.0: enabling comparative genomics across 12 535 organisms.

Hernández-Plaza, Ana; Szklarczyk, Damian; Botas, Jorge; Cantalapiedra, Carlos P; Giner-Lamia, Joaquín; Mende, Daniel R; Kirsch, Rebecca; Rattei, Thomas; Letunic, Ivica; Jensen, Lars J; Bork, Peer; von Mering, Christian; Huerta-Cepas, Jaime.

Nucleic Acids Res ; 51(D1): D389-D394, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36399505

RESUMEN

The eggNOG (evolutionary gene genealogy Non-supervised Orthologous Groups) database is a bioinformatics resource providing orthology data and comprehensive functional information for organisms from all domains of life. Here, we present a major update of the database and website (version 6.0), which increases the number of covered organisms to 12 535 reference species, expands functional annotations, and implements new functionality. In total, eggNOG 6.0 provides a hierarchy of over 17M orthologous groups (OGs) computed at 1601 taxonomic levels, spanning 10 756 bacterial, 457 archaeal and 1322 eukaryotic organisms. OGs have been thoroughly annotated using recent knowledge from functional databases, including KEGG, Gene Ontology, UniProtKB, BiGG, CAZy, CARD, PFAM and SMART. eggNOG also offers phylogenetic trees for all OGs, maximising utility and versatility for end users while allowing researchers to investigate the evolutionary history of speciation and duplication events as well as the phylogenetic distribution of functional terms within each OG. Furthermore, the eggNOG 6.0 website contains new functionality to mine orthology and functional data with ease, including the possibility of generating phylogenetic profiles for multiple OGs across species or identifying single-copy OGs at custom taxonomic levels. eggNOG 6.0 is available at http://eggnog6.embl.de.

Asunto(s)

Bases de Datos Genéticas , Genómica , Filogenia , Biología Computacional , Eucariontes/genética

7.

The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest.

Szklarczyk, Damian; Kirsch, Rebecca; Koutrouli, Mikaela; Nastou, Katerina; Mehryary, Farrokh; Hachilif, Radja; Gable, Annika L; Fang, Tao; Doncheva, Nadezhda T; Pyysalo, Sampo; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 51(D1): D638-D646, 2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36370105

RESUMEN

Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

Asunto(s)

Mapeo de Interacción de Proteínas , Proteínas , Mapeo de Interacción de Proteínas/métodos , Bases de Datos de Proteínas , Proteínas/genética , Proteínas/metabolismo , Genómica , Proteómica , Interfaz Usuario-Computador

8.

Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments.

Gable, Annika L; Szklarczyk, Damian; Lyon, David; Matias Rodrigues, João F; von Mering, Christian.

Brief Bioinform ; 23(5)2022 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-36088548

RESUMEN

A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.

Asunto(s)

Genómica , Programas Informáticos , Bases de Datos Factuales , Bases de Datos Genéticas , Genómica/métodos , Humanos

9.

Correction to 'The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets'.

Szklarczyk, Damian; Gable, Annika L; Nastou, Katerina C; Lyon, David; Kirsch, Rebecca; Pyysalo, Sampo; Doncheva, Nadezhda T; Legeay, Marc; Fang, Tao; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 49(18): 10800, 2021 Oct 11.

Artículo en Inglés | MEDLINE | ID: mdl-34530444

10.

GUNC: detection of chimerism and contamination in prokaryotic genomes.

Orakov, Askarbek; Fullam, Anthony; Coelho, Luis Pedro; Khedkar, Supriya; Szklarczyk, Damian; Mende, Daniel R; Schmidt, Thomas S B; Bork, Peer.

Genome Biol ; 22(1): 178, 2021 06 13.

Artículo en Inglés | MEDLINE | ID: mdl-34120611

RESUMEN

Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome's full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered "high-quality" metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality.

Asunto(s)

Quimerismo , Biología Computacional/métodos , Genoma Bacteriano , Metagenoma , Proteobacteria/genética , Programas Informáticos , Mapeo Contig , Metagenómica/métodos , Filogenia , Células Procariotas/citología , Células Procariotas/metabolismo

11.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.

Szklarczyk, Damian; Gable, Annika L; Nastou, Katerina C; Lyon, David; Kirsch, Rebecca; Pyysalo, Sampo; Doncheva, Nadezhda T; Legeay, Marc; Fang, Tao; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 49(D1): D605-D612, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33237311

RESUMEN

Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

Asunto(s)

Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Proteínas/genética , Interfaz Usuario-Computador

12.

Pathogenic impact of transcript isoform switching in 1,209 cancer samples covering 27 cancer types using an isoform-specific interaction network.

Kahraman, Abdullah; Karakulak, Tülay; Szklarczyk, Damian; von Mering, Christian.

Sci Rep ; 10(1): 14453, 2020 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-32879328

RESUMEN

Under normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). To address the pathogenic impact of these switches, we have analyzed isoform-specific protein-protein interaction disruptions in 1,209 cancer samples covering 27 different cancer types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the International Cancer Genomics Consortium (ICGC). Our study revealed large variations in the number of cancer-specific MDT (cMDT) with the highest frequency in cancers of female reproductive organs. Interestingly, in contrast to the mutational load, cancers arising from the same primary tissue had a similar number of cMDT. Some cMDT were found in 100% of all samples in a cancer type, making them candidates for diagnostic biomarkers. cMDT tend to be located at densely populated network regions where they disrupted protein interactions in the proximity of pathogenic cancer genes. A gene ontology enrichment analysis showed that these disruptions occurred mostly in protein translation and RNA splicing pathways. Interestingly, samples with mutations in the spliceosomal complex tend to have higher number of cMDT, while other transcript expressions correlated with mutations in non-coding splice-site and promoter regions of their genes. This work demonstrates for the first time the large extent of cancer-specific alterations in alternative splicing for 27 different cancer types. It highlights distinct and common patterns of cMDT and suggests novel pathogenic transcripts and markers that induce large network disruptions in cancers.

Asunto(s)

Genómica , Proteínas de Neoplasias/genética , Neoplasias/genética , Isoformas de Proteínas/genética , Empalme Alternativo/genética , Femenino , Regulación Neoplásica de la Expresión Génica/genética , Genitales Femeninos/metabolismo , Genitales Femeninos/patología , Humanos , Masculino , Mutación , Neoplasias/patología , Empalme del ARN/genética , Transducción de Señal/genética , Empalmosomas/genética , Transcripción Genética/genética

13.

The Quest for Orthologs benchmark service and consensus calls in 2020.

Altenhoff, Adrian M; Garrayo-Ventas, Javier; Cosentino, Salvatore; Emms, David; Glover, Natasha M; Hernández-Plaza, Ana; Nevers, Yannis; Sundesha, Vicky; Szklarczyk, Damian; Fernández, José M; Codó, Laia; For Orthologs Consortium, The Quest; Gelpi, Josep Ll; Huerta-Cepas, Jaime; Iwasaki, Wataru; Kelly, Steven; Lecompte, Odile; Muffato, Matthieu; Martin, Maria J; Capella-Gutierrez, Salvador; Thomas, Paul D; Sonnhammer, Erik; Dessimoz, Christophe.

Nucleic Acids Res ; 48(W1): W538-W545, 2020 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-32374845

RESUMEN

The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.

Asunto(s)

Familia de Multigenes , Proteoma , Programas Informáticos , Animales , Benchmarking , Consenso , Genómica , Humanos , Ratones , Filogenia , Ratas

14.

Analysis of the Human Kinome and Phosphatome by Mass Cytometry Reveals Overexpression-Induced Effects on Cancer-Related Signaling.

Lun, Xiao-Kang; Szklarczyk, Damian; Gábor, Attila; Dobberstein, Nadine; Zanotelli, Vito Riccardo Tomaso; Saez-Rodriguez, Julio; von Mering, Christian; Bodenmiller, Bernd.

Mol Cell ; 74(5): 1086-1102.e5, 2019 06 06.

Artículo en Inglés | MEDLINE | ID: mdl-31101498

RESUMEN

Kinase and phosphatase overexpression drives tumorigenesis and drug resistance. We previously developed a mass-cytometry-based single-cell proteomics approach that enables quantitative assessment of overexpression effects on cell signaling. Here, we applied this approach in a human kinome- and phosphatome-wide study to assess how 649 individually overexpressed proteins modulated cancer-related signaling in HEK293T cells in an abundance-dependent manner. Based on these data, we expanded the functional classification of human kinases and phosphatases and showed that the overexpression effects include non-catalytic roles. We detected 208 previously unreported signaling relationships. The signaling dynamics analysis indicated that the overexpression of ERK-specific phosphatases sustains proliferative signaling. This suggests a phosphatase-driven mechanism of cancer progression. Moreover, our analysis revealed a drug-resistant mechanism through which overexpression of tyrosine kinases, including SRC, FES, YES1, and BLK, induced MEK-independent ERK activation in melanoma A375 cells. These proteins could predict drug sensitivity to BRAF-MEK concurrent inhibition in cells carrying BRAF mutations.

Asunto(s)

Carcinogénesis/genética , Melanoma/genética , Monoéster Fosfórico Hidrolasas/genética , Fosfotransferasas/genética , Proteínas Proto-Oncogénicas B-raf/genética , Proliferación Celular/genética , Resistencia a Antineoplásicos/genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Células HEK293 , Humanos , Melanoma/enzimología , Melanoma/patología , Mutación , Fosforilación/genética , Inhibidores de Proteínas Quinasas/farmacología , Proteómica , Transducción de Señal/efectos de los fármacos

15.

Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies.

Heller, Davide; Szklarczyk, Damian; Mering, Christian von.

BMC Bioinformatics ; 20(1): 228, 2019 May 06.

Artículo en Inglés | MEDLINE | ID: mdl-31060495

RESUMEN

BACKGROUND: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS: Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION: The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .

Asunto(s)

Bases de Datos de Proteínas/normas , Filogenia

16.

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Szklarczyk, Damian; Gable, Annika L; Lyon, David; Junge, Alexander; Wyder, Stefan; Huerta-Cepas, Jaime; Simonovic, Milan; Doncheva, Nadezhda T; Morris, John H; Bork, Peer; Jensen, Lars J; Mering, Christian von.

Nucleic Acids Res ; 47(D1): D607-D613, 2019 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-30476243

RESUMEN

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Asunto(s)

Genómica/métodos , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos

17.

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses.

Huerta-Cepas, Jaime; Szklarczyk, Damian; Heller, Davide; Hernández-Plaza, Ana; Forslund, Sofia K; Cook, Helen; Mende, Daniel R; Letunic, Ivica; Rattei, Thomas; Jensen, Lars J; von Mering, Christian; Bork, Peer.

Nucleic Acids Res ; 47(D1): D309-D314, 2019 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-30418610

RESUMEN

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.

Asunto(s)

Secuencia Conservada , Bases de Datos Genéticas , Evolución Molecular , Filogenia , Homología de Secuencia , Animales , Clasificación , Eucariontes/genética , Duplicación de Gen , Ontología de Genes , Genes Virales , Genoma , Humanos , Anotación de Secuencia Molecular , Proteoma , Alineación de Secuencia , Relación Estructura-Actividad

18.

Viruses.STRING: A Virus-Host Protein-Protein Interaction Database.

Cook, Helen Victoria; Doncheva, Nadezhda Tsankova; Szklarczyk, Damian; von Mering, Christian; Jensen, Lars Juhl.

Viruses ; 10(10)2018 09 23.

Artículo en Inglés | MEDLINE | ID: mdl-30249048

RESUMEN

As viruses continue to pose risks to global health, having a better understanding of virusâ»host proteinâ»protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a proteinâ»protein interaction database specifically catering to virusâ»virus and virusâ»host interactions. This database combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins. The database contains 177,425 interactions between 239 viruses and 319 hosts. The database is publicly available at viruses.string-db.org, and the interaction data can also be accessed through the latest version of the Cytoscape STRING app.

Asunto(s)

Bases de Datos de Proteínas , Interacciones Huésped-Patógeno , Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Virus/metabolismo , Animales , Ontología de Genes , Humanos , Probabilidad , Unión Proteica , Mapas de Interacción de Proteínas , Diseño de Software

19.

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper.

Huerta-Cepas, Jaime; Forslund, Kristoffer; Coelho, Luis Pedro; Szklarczyk, Damian; Jensen, Lars Juhl; von Mering, Christian; Bork, Peer.

Mol Biol Evol ; 34(8): 2115-2122, 2017 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-28460117

RESUMEN

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs â¼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

Asunto(s)

Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Bases de Datos de Proteínas , Ontología de Genes , Genoma/genética , Filogenia , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos

20.

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.

Szklarczyk, Damian; Morris, John H; Cook, Helen; Kuhn, Michael; Wyder, Stefan; Simonovic, Milan; Santos, Alberto; Doncheva, Nadezhda T; Roth, Alexander; Bork, Peer; Jensen, Lars J; von Mering, Christian.

Nucleic Acids Res ; 45(D1): D362-D368, 2017 01 04.

Artículo en Inglés | MEDLINE | ID: mdl-27924014

RESUMEN

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Proteínas , Programas Informáticos , Modelos Moleculares , Unión Proteica , Conformación Proteica , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteínas/química , Proteínas/metabolismo , Relación Estructura-Actividad , Interfaz Usuario-Computador , Navegador Web

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA