RESUMEN
Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.
Asunto(s)
Estudio de Asociación del Genoma Completo , Trombosis , Humanos , Bancos de Muestras Biológicas , Hemostasis , Hemorragia/genética , Enfermedades RarasRESUMEN
Autism spectrum disorder (ASD) comprises a large group of neurodevelopmental conditions featuring, over a wide range of severity and combinations, a core set of manifestations (restricted sociality, stereotyped behavior and language impairment) alongside various comorbidities. Common and rare variants in several hundreds of genes and regulatory regions have been implicated in the molecular pathogenesis of ASD along a range of causation evidence strength. Despite significant progress in elucidating the impact of few paradigmatic individual loci, such sheer complexity in the genetic architecture underlying ASD as a whole has hampered the identification of convergent actionable hubs hypothesized to relay between the vastness of risk alleles and the core phenotypes. In turn this has limited the development of strategies that can revert or ameliorate this condition, calling for a systems-level approach to probe the cross-talk of cooperating genes in terms of causal interaction networks in order to make convergences experimentally tractable and reveal their clinical actionability. As a first step in this direction, we have captured from the scientific literature information on the causal links between the genes whose variants have been associated with ASD and the whole human proteome. This information has been annotated in a computer readable format in the SIGNOR database and is made freely available in the resource website. To link this information to cell functions and phenotypes, we have developed graph algorithms that estimate the functional distance of any protein in the SIGNOR causal interactome to phenotypes and pathways. The main novelty of our approach resides in the possibility to explore the mechanistic links connecting the suggested gene-phenotype relations.
Asunto(s)
Trastorno del Espectro Autista , Predisposición Genética a la Enfermedad , Trastornos del Neurodesarrollo , Fenotipo , Humanos , Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad/genética , Trastornos del Neurodesarrollo/genética , Redes Reguladoras de Genes/genética , Trastorno Autístico/genética , Estudios de Asociación Genética/métodos , Proteoma/genéticaRESUMEN
The SIGnaling Network Open Resource (SIGNOR 3.0, https://signor.uniroma2.it) is a public repository that captures causal information and represents it according to an 'activity-flow' model. SIGNOR provides freely-accessible static maps of causal interactions that can be tailored, pruned and refined to build dynamic and predictive models. Each signaling relationship is annotated with an effect (up/down-regulation) and with the mechanism (e.g. binding, phosphorylation, transcriptional activation, etc.) causing the regulation of the target entity. Since its latest release, SIGNOR has undergone a significant upgrade including: (i) a new website that offers an improved user experience and novel advanced search and graph tools; (ii) a significant content growth adding up to a total of approx. 33,000 manually-annotated causal relationships between more than 8900 biological entities; (iii) an increase in the number of manually annotated pathways, currently including pathways deregulated by SARS-CoV-2 infection or involved in neurodevelopment synaptic transmission and metabolism, among others; (iv) additional features such as new model to represent metabolic reactions and a new confidence score assigned to each interaction.
Asunto(s)
Bases de Datos de Proteínas , Humanos , COVID-19 , Fosforilación , SARS-CoV-2/genética , Transducción de Señal , Regulación de la Expresión GénicaRESUMEN
The complexity and heterogeneity of PD necessitate advanced diagnostic and prognostic tools to elucidate its molecular mechanisms accurately. In this study, we addressed this challenge by conducting a pilot phospho-proteomic analysis of peripheral blood mononuclear cells (PBMCs) from idiopathic PD patients at varying disease stages to delineate the functional alterations occurring in these cells throughout the disease course and identify key molecules and pathways contributing to PD progression. By integrating clinical data with phospho-proteomic profiles across various PD stages, we identify potential stage-specific molecular signatures indicative of disease progression. This integrative approach allows for the discernment of distinct disease states and enhances our understanding of PD heterogeneity.
Asunto(s)
Progresión de la Enfermedad , Leucocitos Mononucleares , Enfermedad de Parkinson , Proteoma , Proteómica , Humanos , Enfermedad de Parkinson/metabolismo , Enfermedad de Parkinson/sangre , Enfermedad de Parkinson/patología , Leucocitos Mononucleares/metabolismo , Proteoma/metabolismo , Masculino , Femenino , Persona de Mediana Edad , Proteómica/métodos , Anciano , Fosfoproteínas/metabolismoRESUMEN
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Asunto(s)
Curaduría de Datos/métodos , Bases de Datos de Proteínas , Complejos Multiproteicos/química , Coronavirus/química , Visualización de Datos , Bases de Datos de Compuestos Químicos , Enzimas/química , Enzimas/metabolismo , Escherichia coli/química , Humanos , Cooperación Internacional , Anotación de Secuencia Molecular , Complejos Multiproteicos/metabolismo , Interfaz Usuario-ComputadorRESUMEN
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Asunto(s)
Bases de Datos de Proteínas , Mapas de Interacción de Proteínas/genética , Programas Informáticos , Humanos , Mapeo de Interacción de Proteínas/métodosRESUMEN
The Cyclin-dependent kinase 1, as a serine/threonine protein kinase, is more than a cell cycle regulator as it was originally identified. During the last decade, it has been shown to carry out versatile functions during the last decade. From cell cycle control to gene expression regulation and apoptosis, CDK1 is intimately involved in many cellular events that are vital for cell survival. Here, we provide a comprehensive catalogue of the CDK1 upstream regulators and substrates, describing how this kinase is implicated in the control of key 'cell cycle-unrelated' biological processes. Finally, we describe how deregulation of CDK1 expression and activation has been closely associated with cancer progression and drug resistance.
Asunto(s)
Proteína Quinasa CDC2 , Proteínas Serina-Treonina Quinasas , Humanos , Proteína Quinasa CDC2/genética , Proteína Quinasa CDC2/metabolismo , Proteínas Serina-Treonina Quinasas/genética , Genes cdc , Ciclo Celular , División CelularRESUMEN
SUMMARY: SIGNORApp is a Cytoscape 3 (3.8 and later) application that provides access to causal interactions annotated in the SIGNOR resource. The application builds networks that can be represented as weighted, signed, directed graphs, where nodes are interacting biological entities and edges represent causal interactions captured by expert curators from experiments reported in peer reviewed journals. Users can query the SIGNOR dataset with (i) single or multiple entity name(s) or identifier(s) and optionally they may require to include in the output network their interacting partners, (ii) browse pathways that are annotated in the SIGNOR resource and (iii) extract the entire causal interactome. The app offers two visualizations modes: one only displaying entity interactions and a second emphasizing the post-translational modifications occurring as a consequence of the interaction. In addition, users can click on nodes and edges to access entity and interaction annotations. Causal information is available for three model organisms: Homo sapiens, Mus musculus and Rattus norvegicus. AVAILABILITY AND IMPLEMENTATION: SIGNORApp has been developed for Cytoscape 3 (3.8 and later) in the Java programming language. The latest source code and the plugin can be found at: https://github.com/SIGNORcysAPP/signor-app and https://apps.cytoscape.org/apps/signorapp, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Procesamiento Proteico-Postraduccional , Programas Informáticos , Ratones , Humanos , Animales , RatasRESUMEN
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.
Asunto(s)
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Conjuntos de Datos como Asunto , Ontología de Genes , Bases del Conocimiento , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genéticaRESUMEN
MOTIVATION: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. RESULTS: Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. AVAILABILITY AND IMPLEMENTATION: The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Programas Informáticos , Causalidad , HumanosRESUMEN
The SIGnaling Network Open Resource 2.0 (SIGNOR 2.0) is a public repository that stores signaling information as binary causal relationships between biological entities. The captured information is represented graphically as a signed directed graph. Each signaling relationship is associated to an effect (up/down-regulation) and to the mechanism (e.g. binding, phosphorylation, transcriptional activation, etc.) causing the up/down-regulation of the target entity. Since its first release, SIGNOR has undergone a significant content increase and the number of annotated causal interactions have almost doubled. SIGNOR 2.0 now stores almost 23 000 manually-annotated causal relationships between proteins and other biologically relevant entities: chemicals, phenotypes, complexes, etc. We describe here significant changes in curation policy and a new confidence score, which is assigned to each interaction. We have also improved the compliance to the FAIR data principles by providing (i) SIGNOR stable identifiers, (ii) programmatic access through REST APIs, (iii) bioschemas and (iv) downloadable data in standard-compliant formats, such as PSI-MI CausalTAB and GMT. The data are freely accessible and downloadable at https://signor.uniroma2.it/.
Asunto(s)
Bases de Datos Factuales , Transducción de Señal , Programas Informáticos , Animales , Humanos , Mapas de Interacción de ProteínasRESUMEN
CancerGeneNet (https://signor.uniroma2.it/CancerGeneNet/) is a resource that links genes that are frequently mutated in cancers to cancer phenotypes. The resource takes advantage of a curation effort aimed at embedding a large fraction of the gene products that are found altered in cancer cells into a network of causal protein relationships. Graph algorithms, in turn, allow to infer likely paths of causal interactions linking cancer associated genes to cancer phenotypes thus offering a rational framework for the design of strategies to revert disease phenotypes. CancerGeneNet bridges two interaction layers by connecting proteins whose activities are affected by cancer drivers to proteins that impact on the 'hallmarks of cancer'. In addition, CancerGeneNet annotates curated pathways that are relevant to rationalize the pathological consequences of cancer driver mutations in selected common cancers and 'MiniPathways' illustrating regulatory circuits that are frequently altered in different cancers.
Asunto(s)
Bases de Datos Genéticas , Neoplasias/genética , Proteínas/genética , Algoritmos , Antineoplásicos/farmacología , Gráficos por Computador , Humanos , Terapia Molecular Dirigida , Neoplasias/tratamiento farmacológico , Fenotipo , Interfaz Usuario-ComputadorRESUMEN
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database that collates and summarizes information on stable, macromolecular complexes of known function. It captures complex composition, topology and function and links out to a large range of domain-specific resources that hold more detailed data, such as PDB or Reactome. We have made several significant improvements since our last update, including improving compliance to the FAIR data principles by providing complex-specific, stable identifiers that include versioning. Protein complexes are now available from 20 species for download in standards-compliant formats such as PSI-XML, MI-JSON and ComplexTAB or can be accessed via an improved REST API. A component-based JS front-end framework has been implemented to drive a new website and this has allowed the use of APIs from linked services to import and visualize information such as the 3D structure of protein complexes, its role in reactions and pathways and the co-expression of complex components in the tissues of multi-cellular organisms. A first draft of the complete complexome of Saccharomyces cerevisiae is now available to browse and download.
Asunto(s)
Bases de Datos de Proteínas , Complejos Multiproteicos/química , Animales , Gráficos por Computador , Humanos , Sustancias Macromoleculares/química , Ratones , Complejos Multiproteicos/metabolismo , Ácidos Nucleicos/química , Conformación ProteicaRESUMEN
DISNOR is a new resource that aims at exploiting the explosion of data on the identification of disease-associated genes to assemble inferred disease pathways. This may help dissecting the signaling events whose disruption causes the pathological phenotypes and may contribute to build a platform for precision medicine. To this end we combine the gene-disease association (GDA) data annotated in the DisGeNET resource with a new curation effort aimed at populating the SIGNOR database with causal interactions related to disease genes with the highest possible coverage. DISNOR can be freely accessed at http://DISNOR.uniroma2.it/ where >3700 disease-networks, linking â¼2600 disease genes, can be explored. For each disease curated in DisGeNET, DISNOR links disease genes by manually annotated causal relationships and offers an intuitive visualization of the inferred 'patho-pathways' at different complexity levels. User-defined gene lists are also accepted in the query pipeline. In addition, for each list of query genes-either annotated in DisGeNET or user-defined-DISNOR performs a gene set enrichment analysis on KEGG-defined pathways or on the lists of proteins associated with the inferred disease pathways. This function offers additional information on disease-associated cellular pathways and disease similarity.
Asunto(s)
Bases de Datos Genéticas , Enfermedad/genética , Curaduría de Datos , Redes Reguladoras de Genes , Estudios de Asociación Genética , Humanos , Internet , Mutación , Polimorfismo de Nucleótido Simple , Motor de Búsqueda , Transducción de Señal/genética , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Protein phosphorylation modulates many biological processes. However, the characterization of the complex regulatory circuits underlying cell response to external and internal stimuli is still limited by our inability to describe the phosphorylation network on a global scale. Modern MS-based phosphoproteomics allows monitoring tens of thousands of phosphorylation sites in multiple conditions, making the approach ideal to explore signaling pathways mediated by phosphorylation. Here, we review recent advances in phosphoproteomics and discuss some of the computational approaches developed to facilitate extraction of signaling information from these datasets. Finally, this review focuses on approaches that integrate prior literature information with unbiased phosphoproteomics experiments.
Asunto(s)
Fosfoproteínas/análisis , Mapas de Interacción de Proteínas , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Conjuntos de Datos como Asunto , Humanos , FosforilaciónRESUMEN
Assembly of large biochemical networks can be achieved by confronting new cell-specific experimental data with an interaction subspace constrained by prior literature evidence. The SIGnaling Network Open Resource, SIGNOR (available on line at http://signor.uniroma2.it), was developed to support such a strategy by providing a scaffold of prior experimental evidence of causal relationships between biological entities. The core of SIGNOR is a collection of approximately 12,000 manually-annotated causal relationships between over 2800 human proteins participating in signal transduction. Other entities annotated in SIGNOR are complexes, chemicals, phenotypes and stimuli. The information captured in SIGNOR can be represented as a signed directed graph illustrating the activation/inactivation relationships between signalling entities. Each entry is associated to the post-translational modifications that cause the activation/inactivation of the target proteins. More than 4900 modified residues causing a change in protein concentration or activity have been curated and linked to the modifying enzymes (about 351 human kinases and 94 phosphatases). Additional modifications such as ubiquitinations, sumoylations, acetylations and their effect on the modified target proteins are also annotated. This wealth of structured information can support experimental approaches based on multi-parametric analysis of cell systems after physiological or pathological perturbations and to assemble large logic models.
Asunto(s)
Bases de Datos de Proteínas , Transducción de Señal , Humanos , Internet , Péptidos y Proteínas de Señalización Intracelular/química , Fosfoproteínas Fosfatasas/química , Fosfoproteínas Fosfatasas/metabolismo , Proteínas Quinasas/química , Proteínas Quinasas/metabolismoRESUMEN
The SPla/Ryanodine receptor (SPRY)/B30.2 domain is one of the most common folds in higher eukaryotes. The human genome encodes 103 SPRY/B30.2 domains, several of which are involved in the immune response. Approximately 45% of human SPRY/B30.2-containing proteins are E3 ligases. The role and function of the majority of SPRY/B30.2 domains are still poorly understood, however, in several cases mutations in this domain have been linked to congenital disorders. The recent characterization of SPRY/B30.2-mediated protein interactions has provided evidence for a role of this domain as an adaptor module to assemble macromolecular complexes, analogous to Src homology (SH)2, SH3, and WW domains. However, functional and structural evidence suggests that SPRY/B30.2 is a more versatile fold, allowing a wide range of binding modes.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/metabolismo , Proteínas Portadoras/metabolismo , Sustancias Macromoleculares/metabolismo , Proteínas de la Membrana/metabolismo , Conformación Proteica , Proteínas Adaptadoras Transductoras de Señales/química , Animales , Proteínas Portadoras/química , Humanos , Sustancias Macromoleculares/química , Proteínas de la Membrana/química , Estructura Terciaria de ProteínaRESUMEN
IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org).
Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas , Internet , Programas InformáticosRESUMEN
The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.