RESUMEN
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Asunto(s)
Metagenoma , Metagenómica , Microbiología , Proteínas , Análisis por Conglomerados , Metagenoma/genética , Metagenómica/métodos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Bases de Datos de Proteínas , Conformación ProteicaRESUMEN
Research on tumor-associated neutrophils (TAN) currently surges because of the well-documented strong clinical relevance of tumor-infiltrating neutrophils. This relevance is illustrated by strong correlations between high frequencies of intratumoral neutrophils and poor outcome in the majority of human cancers. Recent high-dimensional analysis of murine neutrophils provides evidence for unexpected plasticity of neutrophils in murine models of cancer and other inflammatory non-malignant diseases. New analysis tools enable deeper insight into the process of neutrophil differentiation and maturation. These technological and scientific developments led to the description of an ever-increasing number of distinct transcriptional states and associated phenotypes in murine models of disease and more recently also in humans. At present, functional validation of these different transcriptional states and potential phenotypes in cancer is lacking. Current functional concepts on neutrophils in cancer rely mainly on the myeloid-derived suppressor cell (MDSC) concept and the dichotomous and simple N1-N2 paradigm. In this manuscript, we review the historic development of those concepts, critically evaluate these concepts against the background of our own work and provide suggestions for a refinement of current concepts in order to facilitate the transition of TAN research from experimental insight to clinical translation.
Asunto(s)
Células Supresoras de Origen Mieloide , Neoplasias , Humanos , Animales , Ratones , Neutrófilos , Neoplasias/terapia , Neoplasias/patología , FenotipoRESUMEN
Public-domain availability for bioinformatics software resources is a key requirement that ensures long-term permanence and methodological reproducibility for research and development across the life sciences. These issues are particularly critical for widely used, efficient, and well-proven methods, especially those developed in research settings that often face funding discontinuities. We re-launch a range of established software components for computational genomics, as legacy version 1.0.1, suitable for sequence matching, masking, searching, clustering and visualization for protein family discovery, annotation and functional characterization on a genome scale. These applications are made available online as open source and include MagicMatch, GeneCAST, support scripts for CoGenT-like sequence collections, GeneRAGE and DifFuse, supported by centrally administered bioinformatics infrastructure funding. The toolkit may also be conceived as a flexible genome comparison software pipeline that supports research in this domain. We illustrate basic use by examples and pictorial representations of the registered tools, which are further described with appropriate documentation files in the corresponding GitHub release.
Asunto(s)
Genómica , Programas Informáticos , Reproducibilidad de los Resultados , Genómica/métodos , Biología Computacional/métodos , GenomaRESUMEN
Rheumatoid arthritis (RA) is characterized by autoimmune joint destruction with debilitating consequences. Despite treatment advancements with biologic therapies, a significant proportion of RA patients show an inadequate clinical response, and restoration of immune self-tolerance represents an unmet therapeutic need. We have previously described a tolerogenic phenotype of plasmacytoid dendritic cells (pDCs) in RA patients responding to anti-TNF-α agents. However, the molecular mechanisms involved in tolerogenic reprogramming of pDCs in RA remain elusive. In this study, guided by transcriptomic analysis of CD303+CD123+ pDCs from RA patients in remission, we revealed enhanced expression of IL-6R and its downstream signaling compared with healthy pDCs. Functional assessment demonstrated that IL-6R engagement resulted in marked reduction of TNF-α secretion by pDCs whereas intracellular TNF-α was significantly increased. Accordingly, pharmacologic inhibition of IL-6R signaling restored TNF-α secretion levels by pDCs. Mechanistic analysis demonstrated impaired activity and decreased lysosomal degradation of ADAM17 (a disintegrin and metalloproteinase 17) sheddase in pDCs, which is essential for TNF-α cleavage. Importantly, reduction of TNF-α secretion by IL-6-treated pDCs attenuated the inflammatory potential of RA patient-derived synovial fibroblasts. Collectively, these findings position pDCs as an important source of TNF-α in RA pathogenesis and unravel an anti-inflammatory mechanism of IL-6 by limiting the pDC-derived TNF-α secretion.
Asunto(s)
Artritis Reumatoide , Interleucina-6 , Humanos , Inhibidores del Factor de Necrosis Tumoral , Células Dendríticas , Transducción de Señal , Factor de Necrosis Tumoral alfaRESUMEN
Bottom-up proteomics analyses have been proved over the last years to be a powerful tool in the characterization of the proteome and are crucial for understanding cellular and organism behaviour. Through differential proteomic analysis researchers can shed light on groups of proteins or individual proteins that play key roles in certain, normal or pathological conditions. However, several tools for the analysis of such complex datasets are powerful, but hard-to-use with steep learning curves. In addition, some other tools are easy to use, but are weak in terms of analytical power. Previously, we have introduced ProteoSign, a powerful, yet user-friendly open-source online platform for protein differential expression/abundance analysis designed with the end-proteomics user in mind. Part of Proteosign's power stems from the utilization of the well-established Linear Models For Microarray Data (LIMMA) methodology. Here, we present a substantial upgrade of this computational resource, called ProteoSign v2, where we introduce major improvements, also based on user feedback. The new version offers more plot options, supports additional experimental designs, analyzes updated input datasets and performs a gene enrichment analysis of the differentially expressed proteins. We also introduce the deployment of the Docker technology and significantly increase the speed of a full analysis. ProteoSign v2 is available at http://bioinformatics.med.uoc.gr/ProteoSign.
Asunto(s)
Proteómica/métodos , Programas Informáticos , Interpretación Estadística de Datos , Internet , Espectrometría de Masas , Proteínas/genética , Proteínas/metabolismoRESUMEN
PURPOSE/AIM OF THE STUDY: The impairment of neurocognitive functions occurs in all subtypes of multiple sclerosis, even from the earliest stages of the disease. Commonly reported manifestations of cognitive impairment include deficits in attention, conceptual reasoning, processing efficiency, information processing speed, memory (episodic and working), verbal fluency (language), and executive functions. Multiple sclerosis patients also suffer from social cognition impairment, which affects their social functioning. The objective of the current paper is to assess the effect of neurocognitive impairment and its potential correlation with social cognition performance and impairment in multiple sclerosis patients. MATERIALS AND METHODS: An overview of the available-to-date literature on neurocognitive impairment and social cognition performance in multiple sclerosis patients by disease subtype was performed. RESULTS: It is not clear if social cognition impairment occurs independently or secondarily to neurocognitive impairment. There are associations of variable strengths between neurocognitive and social cognition deficits and their neural basis is increasingly investigated. CONCLUSIONS: The prompt detection of neurocognitive predictors of social cognition impairment that may be applicable to all multiple sclerosis subtypes and intervention are crucial to prevent further neural and social cognition decline in multiple sclerosis patients.
Asunto(s)
Disfunción Cognitiva , Esclerosis Múltiple , Humanos , Esclerosis Múltiple/complicaciones , Esclerosis Múltiple/psicología , Cognición Social , Función Ejecutiva , Cognición , Disfunción Cognitiva/etiología , Pruebas NeuropsicológicasRESUMEN
Tissue-specific gene methylation events are key to the pathogenesis of several diseases and can be utilized for diagnosis and monitoring. Here, we established an in silico pipeline to analyze high-throughput methylome datasets to identify specific methylation fingerprints in three pathological entities of major burden, i.e., breast cancer (BrCa), osteoarthritis (OA) and diabetes mellitus (DM). Differential methylation analysis was conducted to compare tissues/cells related to the pathology and different types of healthy tissues, revealing Differentially Methylated Genes (DMGs). Highly performing and low feature number biosignatures were built with automated machine learning, including: (1) a five-gene biosignature discriminating BrCa tissue from healthy tissues (AUC 0.987 and precision 0.987), (2) three equivalent OA cartilage-specific biosignatures containing four genes each (AUC 0.978 and precision 0.986) and (3) a four-gene pancreatic ß-cell-specific biosignature (AUC 0.984 and precision 0.995). Next, the BrCa biosignature was validated using an independent ccfDNA dataset showing an AUC and precision of 1.000, verifying the biosignature's applicability in liquid biopsy. Functional and protein interaction prediction analysis revealed that most DMGs identified are involved in pathways known to be related to the studied diseases or pointed to new ones. Overall, our data-driven approach contributes to the maximum exploitation of high-throughput methylome readings, helping to establish specific disease profiles to be applied in clinical practice and to understand human pathology.
Asunto(s)
Neoplasias de la Mama , Osteoartritis , Neoplasias de la Mama/metabolismo , Metilación de ADN , Epigenoma , Femenino , Humanos , Osteoartritis/metabolismoRESUMEN
Protein-protein interactions (PPIs) are of key importance for understanding how cells and organisms function. Thus, in recent decades, many approaches have been developed for the identification and discovery of such interactions. These approaches addressed the problem of PPI identification either by an experimental point of view or by a computational one. Here, we present an updated version of UniReD, a computational prediction tool which takes advantage of biomedical literature aiming to extract documented, already published protein associations and predict undocumented ones. The usefulness of this computational tool has been previously evaluated by experimentally validating predicted interactions and by benchmarking it against public databases of experimentally validated PPIs. In its updated form, UniReD allows the user to provide a list of proteins of known implication in, e.g., a particular disease, as well as another list of proteins that are potentially associated with the proteins of the first list. UniReD then automatically analyzes both lists and ranks the proteins of the second list by their association with the proteins of the first list, thus serving as a potential biomarker discovery/validation tool.
Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas , Biomarcadores , Biología Computacional , Proteínas/metabolismoRESUMEN
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.
Asunto(s)
Evolución Molecular , Ensayos Analíticos de Alto Rendimiento/métodos , Proteoma/genética , ARN/genética , Aminoácidos/genética , ADN/genética , Células Eucariotas/metabolismo , Células Procariotas/metabolismo , Dominios Proteicos/genética , Proteínas/genética , ARN/química , Alineación de SecuenciaRESUMEN
Profiling of proteome dynamics is crucial for understanding cellular behavior in response to intrinsic and extrinsic stimuli and maintenance of homeostasis. Over the last 20 years, mass spectrometry (MS) has emerged as the most powerful tool for large-scale identification and characterization of proteins. Bottom-up proteomics, the most common MS-based proteomics approach, has always been challenging in terms of data management, processing, analysis and visualization, with modern instruments capable of producing several gigabytes of data out of a single experiment. Here, we present ProteoSign, a freely available web application, dedicated in allowing users to perform proteomics differential expression/abundance analysis in a user-friendly and self-explanatory way. Although several non-commercial standalone tools have been developed for post-quantification statistical analysis of proteomics data, most of them are not end-user appealing as they often require very stringent installation of programming environments, third-party software packages and sometimes further scripting or computer programming. To avoid this bottleneck, we have developed a user-friendly software platform accessible via a web interface in order to enable proteomics laboratories and core facilities to statistically analyse quantitative proteomics data sets in a resource-efficient manner. ProteoSign is available at http://bioinformatics.med.uoc.gr/ProteoSign and the source code at https://github.com/yorgodillo/ProteoSign.
Asunto(s)
Proteómica/métodos , Programas Informáticos , Interpretación Estadística de Datos , Internet , Espectrometría de MasasAsunto(s)
Venas Cerebrales , Trombosis Intracraneal , Trombosis de los Senos Intracraneales , Trombosis de la Vena , Venas Cerebrales/diagnóstico por imagen , Humanos , Trombosis Intracraneal/diagnóstico , Trombosis Intracraneal/diagnóstico por imagen , Síndrome , Trombosis de la Vena/diagnóstico , Trombosis de la Vena/diagnóstico por imagenRESUMEN
BACKGROUND: Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. RESULTS: Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank "Description", "Indication", "Pharmacodynamics" and "Mechanism of Action" text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. CONCLUSIONS: DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest .
Asunto(s)
Descubrimiento de Drogas , Interfaz Usuario-Computador , Algoritmos , Análisis por Conglomerados , Bases de Datos Factuales , Humanos , Internet , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismoRESUMEN
More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.
Asunto(s)
Biología Computacional/métodos , Fusión Génica , Animales , Genes Fúngicos , Genoma Humano , Genómica , Humanos , Filogenia , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Proteómica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein-protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools.
Asunto(s)
Minería de Datos/métodos , Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas/métodos , Animales , Minería de Datos/tendencias , Bases de Datos de Proteínas/tendencias , Predicción , Humanos , Mapeo de Interacción de Proteínas/tendenciasRESUMEN
The R2TP is a recently identified Hsp90 co-chaperone, composed of four proteins as follows: Pih1D1, RPAP3, and the AAA(+)-ATPases RUVBL1 and RUVBL2. In mammals, the R2TP is involved in the biogenesis of cellular machineries such as RNA polymerases, small nucleolar ribonucleoparticles and phosphatidylinositol 3-kinase-related kinases. Here, we characterize the spaghetti (spag) gene of Drosophila, the homolog of human RPAP3. This gene plays an essential function during Drosophila development. We show that Spag protein binds Drosophila orthologs of R2TP components and Hsp90, like its yeast counterpart. Unexpectedly, Spag also interacts and stimulates the chaperone activity of Hsp70. Using null mutants and flies with inducible RNAi, we show that spaghetti is necessary for the stabilization of snoRNP core proteins and target of rapamycin activity and likely the assembly of RNA polymerase II. This work highlights the strong conservation of both the HSP90/R2TP system and its clients and further shows that Spag, unlike Saccharomyces cerevisiae Tah1, performs essential functions in metazoans. Interaction of Spag with both Hsp70 and Hsp90 suggests a model whereby R2TP would accompany clients from Hsp70 to Hsp90 to facilitate their assembly into macromolecular complexes.
Asunto(s)
Proteínas de Drosophila/metabolismo , Proteínas HSP70 de Choque Térmico/metabolismo , Proteínas de Choque Térmico/metabolismo , Modelos Biológicos , Chaperonas Moleculares/metabolismo , Ribonucleoproteínas Nucleolares Pequeñas/metabolismo , Animales , Antibacterianos/farmacología , Proteínas Reguladoras de la Apoptosis , Proteínas Portadoras/genética , Proteínas Portadoras/metabolismo , Proteínas de Drosophila/genética , Drosophila melanogaster , Proteínas HSP70 de Choque Térmico/genética , Proteínas de Choque Térmico/genética , Humanos , Chaperonas Moleculares/genética , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , Ribonucleoproteínas Nucleolares Pequeñas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Sirolimus/farmacologíaRESUMEN
SUMMARY: The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY: The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT: g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Autoria , Análisis por Conglomerados , Enfermedad/genética , Genes , Humanos , Internet , Medical Subject Headings , Proteínas , PubMed , PublicacionesRESUMEN
In the majority of downstream analysis pipelines for single-cell RNA sequencing (scRNA-seq), techniques like dimensionality reduction and feature selection are employed to address the problem of high-dimensional nature of the data. These approaches involve mapping the data onto a lower-dimensional space, eliminating less informative genes, and pinpointing the most pertinent features. This process ultimately leads to a reduction in the number of dimensions used for downstream analysis, which in turn speeds up the computation of large-scale scRNA-seq data. Most approaches are directed to isolate from biological background the genes characterizing different cells and or the condition under study by establishing lists of differentially expressed or coexpressed genes. Herein, we present scRNA-Explorer an open-source online tool for simplified and rapid scRNA-seq analysis designed with the end user in mind. scRNA-Explorer utilizes: (i) Filtering out uninformative cells in an interactive manner via a web interface, (ii) Gene correlation analysis coupled with an extra step of evaluating the biological importance of these correlations, and (iii) Gene enrichment analysis of correlated genes in order to find gene implication in specific functions. We developed a pipeline to address the above problem. The scRNA-Explorer pipeline allows users to interrogate in an interactive manner scRNA-sequencing data sets to explore via gene expression correlations possible function(s) of a gene of interest. scRNA-Explorer can be accessed at https://bioinformatics.med.uoc.gr/shinyapps/app/scrnaexplorer.
Asunto(s)
RNA-Seq , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos , Análisis de la Célula Individual/métodos , RNA-Seq/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , InternetRESUMEN
Schizophrenia (SCZ) is a chronic, severe, and complex psychiatric disorder that affects all aspects of personal functioning. While SCZ has a very strong biological component, there are still no objective diagnostic tests. Lately, special attention has been given to epigenetic biomarkers in SCZ. In this study, we introduce a three-step, automated machine learning (AutoML)-based, data-driven, biomarker discovery pipeline approach, using genome-wide DNA methylation datasets and laboratory validation, to deliver a highly performing, blood-based epigenetic biosignature of diagnostic clinical value in SCZ. Publicly available blood methylomes from SCZ patients and healthy individuals were analyzed via AutoML, to identify SCZ-specific biomarkers. The methylation of the identified genes was then analyzed by targeted qMSP assays in blood gDNA of 30 first-episode drug-naïve SCZ patients and 30 healthy controls (CTRL). Finally, AutoML was used to produce an optimized disease-specific biosignature based on patient methylation data combined with demographics. AutoML identified a SCZ-specific set of novel gene methylation biomarkers including IGF2BP1, CENPI, and PSME4. Functional analysis investigated correlations with SCZ pathology. Methylation levels of IGF2BP1 and PSME4, but not CENPI were found to differ, IGF2BP1 being higher and PSME4 lower in the SCZ group as compared to the CTRL group. Additional AutoML classification analysis of our experimental patient data led to a five-feature biosignature including all three genes, as well as age and sex, that discriminated SCZ patients from healthy individuals [AUC 0.755 (0.636, 0.862) and average precision 0.758 (0.690, 0.825)]. In conclusion, this three-step pipeline enabled the discovery of three novel genes and an epigenetic biosignature bearing potential value as promising SCZ blood-based diagnostics.