RESUMEN
NLRs constitute a large, highly conserved family of cytosolic pattern recognition receptors that are central to health and disease, making them key therapeutic targets. NLRC5 is an enigmatic NLR with mutations associated with inflammatory and infectious diseases, but little is known about its function as an innate immune sensor and cell death regulator. Therefore, we screened for NLRC5's role in response to infections, PAMPs, DAMPs, and cytokines. We identified that NLRC5 acts as an innate immune sensor to drive inflammatory cell death, PANoptosis, in response to specific ligands, including PAMP/heme and heme/cytokine combinations. NLRC5 interacted with NLRP12 and PANoptosome components to form a cell death complex, suggesting an NLR network forms similar to those in plants. Mechanistically, TLR signaling and NAD+ levels regulated NLRC5 expression and ROS production to control cell death. Furthermore, NLRC5-deficient mice were protected in hemolytic and inflammatory models, suggesting that NLRC5 could be a potential therapeutic target.
Asunto(s)
Inflamación , Péptidos y Proteínas de Señalización Intracelular , NAD , Animales , Ratones , Inflamación/metabolismo , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Péptidos y Proteínas de Señalización Intracelular/genética , NAD/metabolismo , Humanos , Inmunidad Innata , Ratones Endogámicos C57BL , Especies Reactivas de Oxígeno/metabolismo , Ratones Noqueados , Transducción de Señal , Células HEK293 , Inflamasomas/metabolismo , Proteínas Reguladoras de la Apoptosis/metabolismo , Proteínas Reguladoras de la Apoptosis/genética , Receptores Toll-Like/metabolismo , Masculino , Citocinas/metabolismo , Proteínas de Unión al CalcioRESUMEN
Cytosolic innate immune sensors are critical for host defense and form complexes, such as inflammasomes and PANoptosomes, that induce inflammatory cell death. The sensor NLRP12 is associated with infectious and inflammatory diseases, but its activating triggers and roles in cell death and inflammation remain unclear. Here, we discovered that NLRP12 drives inflammasome and PANoptosome activation, cell death, and inflammation in response to heme plus PAMPs or TNF. TLR2/4-mediated signaling through IRF1 induced Nlrp12 expression, which led to inflammasome formation to induce maturation of IL-1ß and IL-18. The inflammasome also served as an integral component of a larger NLRP12-PANoptosome that drove inflammatory cell death through caspase-8/RIPK3. Deletion of Nlrp12 protected mice from acute kidney injury and lethality in a hemolytic model. Overall, we identified NLRP12 as an essential cytosolic sensor for heme plus PAMPs-mediated PANoptosis, inflammation, and pathology, suggesting that NLRP12 and molecules in this pathway are potential drug targets for hemolytic and inflammatory diseases.
Asunto(s)
Inflamasomas , Moléculas de Patrón Molecular Asociado a Patógenos , Animales , Ratones , Inflamasomas/metabolismo , Hemo , Inflamación , Piroptosis , Péptidos y Proteínas de Señalización IntracelularRESUMEN
Peptide- and protein-based therapeutics are becoming a promising treatment regimen for myriad diseases. Toxicity of proteins is the primary hurdle for protein-based therapies. Thus, there is an urgent need for accurate in silico methods for determining toxic proteins to filter the pool of potential candidates. At the same time, it is imperative to precisely identify non-toxic proteins to expand the possibilities for protein-based biologics. To address this challenge, we proposed an ensemble framework, called VISH-Pred, comprising models built by fine-tuning ESM2 transformer models on a large, experimentally validated, curated dataset of protein and peptide toxicities. The primary steps in the VISH-Pred framework are to efficiently estimate protein toxicities taking just the protein sequence as input, employing an under sampling technique to handle the humongous class-imbalance in the data and learning representations from fine-tuned ESM2 protein language models which are then fed to machine learning techniques such as Lightgbm and XGBoost. The VISH-Pred framework is able to correctly identify both peptides/proteins with potential toxicity and non-toxic proteins, achieving a Matthews correlation coefficient of 0.737, 0.716 and 0.322 and F1-score of 0.759, 0.696 and 0.713 on three non-redundant blind tests, respectively, outperforming other methods by over $10\%$ on these quality metrics. Moreover, VISH-Pred achieved the best accuracy and area under receiver operating curve scores on these independent test sets, highlighting the robustness and generalization capability of the framework. By making VISH-Pred available as an easy-to-use web server, we expect it to serve as a valuable asset for future endeavors aimed at discerning the toxicity of peptides and enabling efficient protein-based therapeutics.
Asunto(s)
Proteínas , Proteínas/metabolismo , Proteínas/química , Aprendizaje Automático , Bases de Datos de Proteínas , Biología Computacional/métodos , Humanos , Péptidos/toxicidad , Péptidos/química , Simulación por Computador , Algoritmos , Programas InformáticosRESUMEN
BACKGROUND: The innate immune system serves as the first line of host defense. Transforming growth factor-ß-activated kinase 1 (TAK1) is a key regulator of innate immunity, cell survival, and cellular homeostasis. Because of its importance in immunity, several pathogens have evolved to carry TAK1 inhibitors. In response, hosts have evolved to sense TAK1 inhibition and induce robust lytic cell death, PANoptosis, mediated by the RIPK1-PANoptosome. PANoptosis is a unique innate immune inflammatory lytic cell death pathway initiated by an innate immune sensor and driven by caspases and RIPKs. While PANoptosis can be beneficial to clear pathogens, excess activation is linked to pathology. Therefore, understanding the molecular mechanisms regulating TAK1 inhibitor (TAK1i)-induced PANoptosis is central to our understanding of RIPK1 in health and disease. RESULTS: In this study, by analyzing results from a cell death-based CRISPR screen, we identified protein phosphatase 6 (PP6) holoenzyme components as regulators of TAK1i-induced PANoptosis. Loss of the PP6 enzymatic component, PPP6C, significantly reduced TAK1i-induced PANoptosis. Additionally, the PP6 regulatory subunits PPP6R1, PPP6R2, and PPP6R3 had redundant roles in regulating TAK1i-induced PANoptosis, and their combined depletion was required to block TAK1i-induced cell death. Mechanistically, PPP6C and its regulatory subunits promoted the pro-death S166 auto-phosphorylation of RIPK1 and led to a reduction in the pro-survival S321 phosphorylation. CONCLUSIONS: Overall, our findings demonstrate a key requirement for the phosphatase PP6 complex in the activation of TAK1i-induced, RIPK1-dependent PANoptosis, suggesting this complex could be therapeutically targeted in inflammatory conditions.
Asunto(s)
Fosfoproteínas Fosfatasas , Proteína Serina-Treonina Quinasas de Interacción con Receptores , Proteína Serina-Treonina Quinasas de Interacción con Receptores/metabolismo , Proteína Serina-Treonina Quinasas de Interacción con Receptores/genética , Humanos , Fosfoproteínas Fosfatasas/metabolismo , Fosfoproteínas Fosfatasas/genética , Quinasas Quinasa Quinasa PAM/metabolismo , Quinasas Quinasa Quinasa PAM/genética , Necroptosis , Inmunidad InnataRESUMEN
BACKGROUND: Predictive biomarkers of immune checkpoint inhibitor (ICI) efficacy are currently lacking for non-small cell lung cancer (NSCLC). Here, we describe the results from the Anti-PD-1 Response Prediction DREAM Challenge, a crowdsourced initiative that enabled the assessment of predictive models by using data from two randomized controlled clinical trials (RCTs) of ICIs in first-line metastatic NSCLC. METHODS: Participants developed and trained models using public resources. These were evaluated with data from the CheckMate 026 trial (NCT02041533), according to the model-to-data paradigm to maintain patient confidentiality. The generalizability of the models with the best predictive performance was assessed using data from the CheckMate 227 trial (NCT02477826). Both trials were phase III RCTs with a chemotherapy control arm, which supported the differentiation between predictive and prognostic models. Isolated model containers were evaluated using a bespoke strategy that considered the challenges of handling transcriptome data from clinical trials. RESULTS: A total of 59 teams participated, with 417 models submitted. Multiple predictive models, as opposed to a prognostic model, were generated for predicting overall survival, progression-free survival, and progressive disease status with ICIs. Variables within the models submitted by participants included tumor mutational burden (TMB), programmed death ligand 1 (PD-L1) expression, and gene-expression-based signatures. The best-performing models showed improved predictive power over reference variables, including TMB or PD-L1. CONCLUSIONS: This DREAM Challenge is the first successful attempt to use protected phase III clinical data for a crowdsourced effort towards generating predictive models for ICI clinical outcomes and could serve as a blueprint for similar efforts in other tumor types and disease states, setting a benchmark for future studies aiming to identify biomarkers predictive of ICI efficacy. TRIAL REGISTRATION: CheckMate 026; NCT02041533, registered January 22, 2014. CheckMate 227; NCT02477826, registered June 23, 2015.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/genética , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Neoplasias Pulmonares/patología , Antígeno B7-H1 , Biomarcadores de TumorRESUMEN
Chromosomal translocations that generate in-frame oncogenic gene fusions are notable examples of the success of targeted cancer therapies. We have previously described gene fusions of FGFR3-TACC3 (F3-T3) in 3% of human glioblastoma cases. Subsequent studies have reported similar frequencies of F3-T3 in many other cancers, indicating that F3-T3 is a commonly occuring fusion across all tumour types. F3-T3 fusions are potent oncogenes that confer sensitivity to FGFR inhibitors, but the downstream oncogenic signalling pathways remain unknown. Here we show that human tumours with F3-T3 fusions cluster within transcriptional subgroups that are characterized by the activation of mitochondrial functions. F3-T3 activates oxidative phosphorylation and mitochondrial biogenesis and induces sensitivity to inhibitors of oxidative metabolism. Phosphorylation of the phosphopeptide PIN4 is an intermediate step in the signalling pathway of the activation of mitochondrial metabolism. The F3-T3-PIN4 axis triggers the biogenesis of peroxisomes and the synthesis of new proteins. The anabolic response converges on the PGC1α coactivator through the production of intracellular reactive oxygen species, which enables mitochondrial respiration and tumour growth. These data illustrate the oncogenic circuit engaged by F3-T3 and show that F3-T3-positive tumours rely on mitochondrial respiration, highlighting this pathway as a therapeutic opportunity for the treatment of tumours with F3-T3 fusions. We also provide insights into the genetic alterations that initiate the chain of metabolic responses that drive mitochondrial metabolism in cancer.
Asunto(s)
Respiración de la Célula , Proteínas Asociadas a Microtúbulos/genética , Mitocondrias/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Proteínas de Fusión Oncogénica/genética , Receptor Tipo 3 de Factor de Crecimiento de Fibroblastos/genética , Animales , Encéfalo/efectos de los fármacos , Encéfalo/metabolismo , Encéfalo/patología , Línea Celular Tumoral , Respiración de la Célula/efectos de los fármacos , Transformación Celular Neoplásica/efectos de los fármacos , Femenino , Glioblastoma/tratamiento farmacológico , Glioblastoma/genética , Glioblastoma/metabolismo , Glioblastoma/patología , Humanos , Masculino , Ratones , Mitocondrias/efectos de los fármacos , Mitocondrias/genética , Peptidilprolil Isomerasa de Interacción con NIMA/química , Peptidilprolil Isomerasa de Interacción con NIMA/metabolismo , Neoplasias/tratamiento farmacológico , Neoplasias/patología , Biogénesis de Organelos , Fosforilación Oxidativa/efectos de los fármacos , Coactivador 1-alfa del Receptor Activado por Proliferadores de Peroxisomas gamma/metabolismo , Peroxisomas/efectos de los fármacos , Peroxisomas/metabolismo , Fosforilación , Biosíntesis de Proteínas , Especies Reactivas de Oxígeno/metabolismo , Receptores de Estrógenos/metabolismo , Transcripción Genética , Ensayos Antitumor por Modelo de XenoinjertoRESUMEN
A cancer immune phenotype characterized by an active T-helper 1 (Th1)/cytotoxic response is associated with responsiveness to immunotherapy and favorable prognosis across different tumors. However, in some cancers, such an intratumoral immune activation does not confer protection from progression or relapse. Defining mechanisms associated with immune evasion is imperative to refine stratification algorithms, to guide treatment decisions and to identify candidates for immune-targeted therapy. Molecular alterations governing mechanisms for immune exclusion are still largely unknown. The availability of large genomic datasets offers an opportunity to ascertain key determinants of differential intratumoral immune response. We follow a network-based protocol to identify transcription regulators (TRs) associated with poor immunologic antitumor activity. We use a consensus of four different pipelines consisting of two state-of-the-art gene regulatory network inference techniques, regularized gradient boosting machines and ARACNE to determine TR regulons, and three separate enrichment techniques, including fast gene set enrichment analysis, gene set variation analysis and virtual inference of protein activity by enriched regulon analysis to identify the most important TRs affecting immunologic antitumor activity. These TRs, referred to as master regulators (MRs), are unique to immune-silent and immune-active tumors, respectively. We validated the MRs coherently associated with the immune-silent phenotype across cancers in The Cancer Genome Atlas and a series of additional datasets in the Prediction of Clinical Outcomes from Genomic Profiles repository. A downstream analysis of MRs specific to the immune-silent phenotype resulted in the identification of several enriched candidate pathways, including NOTCH1, TGF-$\beta $, Interleukin-1 and TNF-$\alpha $ signaling pathways. TGFB1I1 emerged as one of the main negative immune modulators preventing the favorable effects of a Th1/cytotoxic response.
Asunto(s)
Biomarcadores de Tumor , Susceptibilidad a Enfermedades , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Neoplasias/etiología , Neoplasias/metabolismo , Fenotipo , Biología Computacional/métodos , Bases de Datos Genéticas , Susceptibilidad a Enfermedades/inmunología , Perfilación de la Expresión Génica/métodos , Humanos , Inmunofenotipificación , Reproducibilidad de los Resultados , Transducción de Señal , TranscriptomaRESUMEN
BACKGROUND: Tumor invasiveness reflects numerous biological changes, including tumorigenesis, progression, and metastasis. To decipher the role of transcriptional regulators (TR) involved in tumor invasiveness, we performed a systematic network-based pan-cancer assessment of master regulators of cancer invasiveness. MATERIALS AND METHODS: We stratified patients in The Cancer Genome Atlas (TCGA) into invasiveness high (INV-H) and low (INV-L) groups using consensus clustering based on an established robust 24-gene signature to determine the prognostic association of invasiveness with overall survival (OS) across 32 different cancers. We devise a network-based protocol to identify TRs as master regulators (MRs) unique to INV-H and INV-L phenotypes. We validated the activity of MRs coherently associated with INV-H phenotype and worse OS across cancers in TCGA on a series of additional datasets in the Prediction of Clinical Outcomes from the Genomic Profiles (PRECOG) repository. RESULTS: Based on the 24-gene signature, we defined the invasiveness score for each patient sample and stratified patients into INV-H and INV-L clusters. We observed that invasiveness was associated with worse survival outcomes in almost all cancers and had a significant association with OS in ten out of 32 cancers. Our network-based framework identified common invasiveness-associated MRs specific to INV-H and INV-L groups across the ten prognostic cancers, including COL1A1, which is also part of the 24-gene signature, thus acting as a positive control. Downstream pathway analysis of MRs specific to INV-H phenotype resulted in the identification of several enriched pathways, including Epithelial into Mesenchymal Transition, TGF-ß signaling pathway, regulation of Toll-like receptors, cytokines, and inflammatory response, and selective expression of chemokine receptors during T-cell polarization. Most of these pathways have connotations of inflammatory immune response and feasibility for metastasis. CONCLUSION: Our pan-cancer study provides a comprehensive master regulator analysis of tumor invasiveness and can suggest more precise therapeutic strategies by targeting the identified MRs and downstream enriched pathways for patients across multiple cancers.
Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Carcinogénesis , Transformación Celular Neoplásica , Análisis por Conglomerados , CitocinasRESUMEN
OBJECTIVES: To examine the hypothesis that obesity complicated by the metabolic syndrome, compared to uncomplicated obesity, has distinct molecular signatures and metabolic pathways. METHODS: We analyzed a cohort of 39 participants with obesity that included 21 with metabolic syndrome, age-matched to 18 without metabolic complications. We measured in whole blood samples 754 human microRNAs (miRNAs), 704 metabolites using unbiased mass spectrometry metabolomics, and 25,682 transcripts, which include both protein coding genes (PCGs) as well as non-coding transcripts. We then identified differentially expressed miRNAs, PCGs, and metabolites and integrated them using databases such as mirDIP (mapping between miRNA-PCG network), Human Metabolome Database (mapping between metabolite-PCG network) and tools like MetaboAnalyst (mapping between metabolite-metabolic pathway network) to determine dysregulated metabolic pathways in obesity with metabolic complications. RESULTS: We identified 8 significantly enriched metabolic pathways comprising 8 metabolites, 25 protein coding genes and 9 microRNAs which are each differentially expressed between the subjects with obesity and those with obesity and metabolic syndrome. By performing unsupervised hierarchical clustering on the enrichment matrix of the 8 metabolic pathways, we could approximately segregate the uncomplicated obesity strata from that of obesity with metabolic syndrome. CONCLUSIONS: The data suggest that at least 8 metabolic pathways, along with their various dysregulated elements, identified via our integrative bioinformatics pipeline, can potentially differentiate those with obesity from those with obesity and metabolic complications.
Asunto(s)
Síndrome Metabólico , MicroARNs , Humanos , Síndrome Metabólico/complicaciones , Síndrome Metabólico/genética , Multiómica , Estudios de Casos y Controles , Obesidad/complicaciones , Obesidad/genética , MicroARNs/genéticaRESUMEN
Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.
Asunto(s)
Población Negra/genética , Familia , Genoma Humano , Población Blanca/genética , Femenino , Humanos , MasculinoRESUMEN
MOTIVATION: A global effort is underway to identify compounds for the treatment of COVID-19. Since de novo compound design is an extremely long, time-consuming and expensive process, efforts are underway to discover existing compounds that can be repurposed for COVID-19 and new viral diseases.We propose a machine learning representation framework that uses deep learning induced vector embeddings of compounds and viral proteins as features to predict compound-viral protein activity. The prediction model in-turn uses a consensus framework to rank approved compounds against viral proteins of interest. RESULTS: Our consensus framework achieves a high mean Pearson correlation of 0.916, mean R2 of 0.840 and a low mean squared error of 0.313 for the task of compound-viral protein activity prediction on an independent test set. As a use case, we identify a ranked list of 47 compounds common to three main proteins of SARS-COV-2 virus (PL-PRO, 3CL-PRO and Spike protein) as potential targets including 21 antivirals, 15 anticancer, 5 antibiotics and 6 other investigational human compounds. We perform additional molecular docking simulations to demonstrate that majority of these compounds have low binding energies and thus high binding affinity with the potential to be effective against the SARS-COV-2 virus. AVAILABILITY AND IMPLEMENTATION: All the source code and data is available at: https://github.com/raghvendra5688/Drug-Repurposing and https://dx.doi.org/10.17632/8rrwnbcgmx.3. We also implemented a web-server at: https://machinelearning-protein.qcri.org/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
BACKGROUND: Advances in our understanding of the tumor microenvironment have radically changed the cancer field, highlighting the emerging need for biomarkers of an active, favorable tumor immune phenotype to aid treatment stratification and clinical prognostication. Numerous immune-related gene signatures have been defined; however, their prognostic value is often limited to one or few cancer types. Moreover, the area of non-coding RNA as biomarkers remains largely unexplored although their number and biological roles are rapidly expanding. METHODS: We developed a multi-step process to identify immune-related long non-coding RNA signatures with prognostic connotation in multiple TCGA solid cancer datasets. RESULTS: Using the breast cancer dataset as a discovery cohort we found 2988 differentially expressed lncRNAs between immune favorable and unfavorable tumors, as defined by the immunologic constant of rejection (ICR) gene signature. Mapping of the lncRNAs to a coding-non-coding network identified 127 proxy protein-coding genes that are enriched in immune-related diseases and functions. Next, we defined two distinct 20-lncRNA prognostic signatures that show a stronger effect on overall survival than the ICR signature in multiple solid cancers. Furthermore, we found a 3 lncRNA signature that demonstrated prognostic significance across 5 solid cancer types with a stronger association with clinical outcome than ICR. Moreover, this 3 lncRNA signature showed additional prognostic significance in uterine corpus endometrial carcinoma and cervical squamous cell carcinoma and endocervical adenocarcinoma as compared to ICR. CONCLUSION: We identified an immune-related 3-lncRNA signature with prognostic connotation in multiple solid cancer types which performed equally well and in some cases better than the 20-gene ICR signature, indicating that it could be used as a minimal informative signature for clinical implementation.
Asunto(s)
Carcinoma de Células Escamosas , ARN Largo no Codificante , Neoplasias del Cuello Uterino , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Carcinoma de Células Escamosas/genética , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Estimación de Kaplan-Meier , Pronóstico , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Microambiente Tumoral , Neoplasias del Cuello Uterino/genéticaRESUMEN
Background: Obesity coexists with variable features of metabolic syndrome, which is associated with dysregulated metabolic pathways. We assessed potential associations between serum metabolites and features of metabolic syndrome in Arabic subjects with obesity. Methods: We analyzed a dataset of 39 subjects with obesity only (OBO, n = 18) age-matched to subjects with obesity and metabolic syndrome (OBM, n = 21). We measured 1069 serum metabolites and correlated them to clinical features. Results: A total of 83 metabolites, mostly lipids, were significantly different (p < 0.05) between the two groups. Among lipids, 22 sphingomyelins were decreased in OBM compared to OBO. Among non-lipids, quinolinate, kynurenine, and tryptophan were also decreased in OBM compared to OBO. Sphingomyelin is negatively correlated with glucose, HbA1C, insulin, and triglycerides but positively correlated with HDL, LDL, and cholesterol. Differentially enriched pathways include lysine degradation, amino sugar and nucleotide sugar metabolism, arginine and proline metabolism, fructose and mannose metabolism, and galactose metabolism. Conclusions: Metabolites and pathways associated with chronic inflammation are differentially expressed in subjects with obesity and metabolic syndrome compared to subjects with obesity but without the clinical features of metabolic syndrome.
Asunto(s)
Resistencia a la Insulina , Síndrome Metabólico , Humanos , Redes y Vías Metabólicas , Obesidad/complicaciones , TriglicéridosRESUMEN
MOTIVATION: X-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization. RESULTS: In this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew's correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew's correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability. AVAILABILITY AND IMPLEMENTATION: Our BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional , Proteínas , Cristalización , Cristalografía por Rayos X , Programas InformáticosRESUMEN
MOTIVATION: Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. RESULTS: Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew's correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. AVAILABILITY AND IMPLEMENTATION: The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Aprendizaje Profundo , Secuencia de Aminoácidos , Biología Computacional , Cristalización , ProteínasRESUMEN
We propose a generic framework for gene regulatory network (GRN) inference approached as a feature selection problem. GRNs obtained using Machine Learning techniques are often dense, whereas real GRNs are rather sparse. We use a Tikonov regularization inspired optimal L-curve criterion that utilizes the edge weight distribution for a given target gene to determine the optimal set of TFs associated with it. Our proposed framework allows to incorporate a mechanistic active biding network based on cis-regulatory motif analysis. We evaluate our regularization framework in conjunction with two non-linear ML techniques, namely gradient boosting machines (GBM) and random-forests (GENIE), resulting in a regularized feature selection based method specifically called RGBM and RGENIE respectively. RGBM has been used to identify the main transcription factors that are causally involved as master regulators of the gene expression signature activated in the FGFR3-TACC3-positive glioblastoma. Here, we illustrate that RGBM identifies the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators characterizing the difference between G-CIMP-high and G-CIMP-low subtypes and between PA-like and LGm6-GBM, thus providing a clue to the yet undetermined nature of the transcriptional events among these subtypes.
Asunto(s)
Redes Reguladoras de Genes/genética , Glioma/genética , Motivos de Nucleótidos/genética , Factores de Transcripción/genética , Algoritmos , Regulación Neoplásica de la Expresión Génica/genética , Glioma/clasificación , Glioma/patología , Humanos , Aprendizaje Automático , Proteínas Asociadas a Microtúbulos/genética , Receptor Tipo 3 de Factor de Crecimiento de Fibroblastos/genéticaRESUMEN
Motivation: Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew's correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins. Availability and implementation: DeepSol's best performing models and results are publicly deposited at https://doi.org/10.5281/zenodo.1162886 (Khurana and Mall, 2018). Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Proteínas/química , Secuencia de Aminoácidos , Simulación por Computador , SolubilidadRESUMEN
Motivation: Protein solubility can be a decisive factor in both research and production efficiency, and in silico sequence-based predictors that can accurately estimate solubility outcomes are highly sought. Results: In this study, we present a novel approach termed PRotein SolubIlity Predictor (PaRSnIP), which uses a gradient boosting machine algorithm as well as an approximation of sequence and structural features of the protein of interest. Based on an independent test set, PaRSnIP outperformed other state-of-the-art sequence-based methods by more than 9% in accuracy and 0.17 in Matthew's correlation coefficient, with an overall accuracy of 74% and Matthew's correlation coefficient of 0.48. Additionally, PaRSnIP provides importance scores for all features used in training. We observed higher fractions of exposed residues to associate positively with protein solubility and tripeptide stretches with multiple histidines to associate negatively with solubility. The improved prediction accuracy of PaRSnIP should enable it to predict protein solubility with greater reliability and to screen for sequence variants with enhanced manufacturability. Availability and implementation: PaRSnIP software is available for download under GitHub (https://github.com/RedaRawi/PaRSnIP). Contact: gwo-yu.chuang@nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Biología Computacional/métodos , Reproducibilidad de los Resultados , SolubilidadRESUMEN
Correction for 'Exploring new approaches towards the formability of mixed-ion perovskites by DFT and machine learning' by Heesoo Park et al., Phys. Chem. Chem. Phys., 2019, DOI: 10.1039/c8cp06528d.
RESUMEN
Recent years have witnessed a growing effort in engineering and tuning the properties of hybrid halide perovskites as light absorbers. These have led to the successful enhancement of their stability, a feature that is often counterbalanced by a reduction of their power-conversion efficiency. In order to provide a systematic analysis of the structure-property relationships of this class of compounds we have performed density functional theory calculations exploring fully inorganic ABC3 chalcogenide (I-V-VI3), halide (I-II-VII3) and hybrid perovskites. Special attention has been given to structures featuring three-dimensional BC6 octahedral networks because of their efficient carrier transport properties. In particular we have carefully analyzed the role of BC6 octahedral deformations, rotations and tilts in the thermodynamic stability and optical properties of the compounds. By using machine learning algorithms we have estimated the relations between the octahedral deformation and the bandgap, and established a similarity map among all the calculated compounds.