Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 15(1): 4055, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38744843

RESUMEN

We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.


Asunto(s)
Algoritmos , Simulación por Computador , Redes Reguladoras de Genes , RNA-Seq , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , RNA-Seq/métodos , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Biología Computacional/métodos , Benchmarking , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula
2.
Clin Proteomics ; 20(1): 44, 2023 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-37875801

RESUMEN

The quest for understanding and managing the long-term effects of COVID-19, often referred to as Long COVID or post-COVID-19 condition (PCC), remains an active research area. Recent findings highlighted angiopoietin-1 (ANG-1) and p-selectin (P-SEL) as potential diagnostic markers, but validation is essential, given the inconsistency in COVID-19 biomarker studies. Leveraging the biobanque québécoise de la COVID-19 (BQC19) biobank, we analyzed the data of 249 participants. Both ANG-1 and P-SEL levels were significantly higher in patients with PCC participants compared with control subjects at 3 months using the Mann-Whitney U test. We managed to reproduce and validate the findings, emphasizing the importance of collaborative biobanking efforts in enhancing the reproducibility and credibility of Long COVID research outcomes.

3.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37326960

RESUMEN

MOTIVATION: Interpretable deep learning (DL) models that can provide biological insights, in addition to accurate predictions, are of great interest to the biomedical community. Recently, interpretable DL models that incorporate signaling pathways have been proposed for drug response prediction (DRP). While these models improve interpretability, it is unclear whether this comes at the cost of less accurate DRPs, or a prediction improvement can also be obtained. RESULTS: We comprehensively and systematically assessed four state-of-the-art interpretable DL models using three pathway collections to assess their ability in making accurate predictions on unseen samples from the same dataset, as well as their generalizability to an independent dataset. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. However, in most evaluation setups, the best performance was achieved using a black-box multilayer perceptron, and the performance of a random forests baseline was comparable to those of the interpretable models. Replacing the signaling pathways with randomly generated pathways showed a comparable performance for the majority of the models. Finally, the performance of all models deteriorated when applied to an independent dataset. These results highlight the importance of systematic evaluation of newly proposed models using carefully selected baselines. We provide different evaluation setups and baseline models that can be used to achieve this goal. AVAILABILITY AND IMPLEMENTATION: Implemented models and datasets are provided at https://doi.org/10.5281/zenodo.7787178 and https://doi.org/10.5281/zenodo.7101665, respectively.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Bosques Aleatorios
4.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-37021933

RESUMEN

MOTIVATION: Combination therapies have emerged as a treatment strategy for cancers to reduce the probability of drug resistance and to improve outcomes. Large databases curating the results of many drug screening studies on preclinical cancer cell lines have been developed, capturing the synergistic and antagonistic effects of combination of drugs in different cell lines. However, due to the high cost of drug screening experiments and the sheer size of possible drug combinations, these databases are quite sparse. This necessitates the development of transductive computational models to accurately impute these missing values. RESULTS: Here, we developed MARSY, a deep-learning multitask model that incorporates information on the gene expression profile of cancer cell lines, as well as the differential expression signature induced by each drug to predict drug-pair synergy scores. By utilizing two encoders to capture the interplay between the drug pairs, as well as the drug pairs and cell lines, and by adding auxiliary tasks in the predictor, MARSY learns latent embeddings that improve the prediction performance compared to state-of-the-art and traditional machine-learning models. Using MARSY, we then predicted the synergy scores of 133 722 new drug-pair cell line combinations, which we have made available to the community as part of this study. Moreover, we validated various insights obtained from these novel predictions using independent studies, confirming the ability of MARSY in making accurate novel predictions. AVAILABILITY AND IMPLEMENTATION: An implementation of the algorithms in Python and cleaned input datasets are provided in https://github.com/Emad-COMBINE-lab/MARSY.


Asunto(s)
Aprendizaje Profundo , Biología Computacional/métodos , Combinación de Medicamentos , Algoritmos , Línea Celular Tumoral
5.
Front Med (Lausanne) ; 10: 1122328, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36993805

RESUMEN

Background: Human glomerulonephritis (GN)-membranous nephropathy (MN), focal segmental glomerulosclerosis (FSGS) and IgA nephropathy (IgAN), as well as diabetic nephropathy (DN) are leading causes of chronic kidney disease. In these glomerulopathies, distinct stimuli disrupt metabolic pathways in glomerular cells. Other pathways, including the endoplasmic reticulum (ER) unfolded protein response (UPR) and autophagy, are activated in parallel to attenuate cell injury or promote repair. Methods: We used publicly available datasets to examine gene transcriptional pathways in glomeruli of human GN and DN and to identify drugs. Results: We demonstrate that there are many common genes upregulated in MN, FSGS, IgAN, and DN. Furthermore, these glomerulopathies were associated with increased expression of ER/UPR and autophagy genes, a significant number of which were shared. Several candidate drugs for treatment of glomerulopathies were identified by relating gene expression signatures of distinct drugs in cell culture with the ER/UPR and autophagy genes upregulated in the glomerulopathies ("connectivity mapping"). Using a glomerular cell culture assay that correlates with glomerular damage in vivo, we showed that one candidate drug - neratinib (an epidermal growth factor receptor inhibitor) is cytoprotective. Conclusion: The UPR and autophagy are activated in multiple types of glomerular injury. Connectivity mapping identified candidate drugs that shared common signatures with ER/UPR and autophagy genes upregulated in glomerulopathies, and one of these drugs attenuated injury of glomerular cells. The present study opens the possibility for modulating the UPR or autophagy pharmacologically as therapy for GN.

6.
Genomics Proteomics Bioinformatics ; 21(3): 535-550, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36775056

RESUMEN

Prediction of the response of cancer patients to different treatments and identification of biomarkers of drug response are two major goals of individualized medicine. Here, we developed a deep learning framework called TINDL, completely trained on preclinical cancer cell lines (CCLs), to predict the response of cancer patients to different treatments. TINDL utilizes a tissue-informed normalization to account for the tissue type and cancer type of the tumors and to reduce the statistical discrepancies between CCLs and patient tumors. Moreover, by making the deep learning black box interpretable, this model identifies a small set of genes whose expression levels are predictive of drug response in the trained model, enabling identification of biomarkers of drug response. Using data from two large databases of CCLs and cancer tumors, we showed that this model can distinguish between sensitive and resistant tumors for 10 (out of 14) drugs, outperforming various other machine learning models. In addition, our small interfering RNA (siRNA) knockdown experiments on 10 genes identified by this model for one of the drugs (tamoxifen) confirmed that tamoxifen sensitivity is substantially influenced by all of these genes in MCF7 cells, and seven of these genes in T47D cells. Furthermore, genes implicated for multiple drugs pointed to shared mechanism of action among drugs and suggested several important signaling pathways. In summary, this study provides a powerful deep learning framework for prediction of drug response and identification of biomarkers of drug response in cancer. The code can be accessed at https://github.com/ddhostallero/tindl.


Asunto(s)
Antineoplásicos , Neoplasias , Humanos , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Tamoxifeno/farmacología , Tamoxifeno/uso terapéutico , Biomarcadores , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Neoplasias/patología , Aprendizaje Automático
7.
Bioinformatics ; 38(16): 3958-3967, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35771595

RESUMEN

MOTIVATION: Computational methods for the prediction of protein-protein interactions (PPIs), while important tools for researchers, are plagued by challenges in generalizing to unseen proteins. Datasets used for modelling protein-protein predictions are particularly predisposed to information leakage and sampling biases. RESULTS: In this study, we introduce RAPPPID, a method for the Regularized Automatic Prediction of Protein-Protein Interactions using Deep Learning. RAPPPID is a twin Averaged Weight-Dropped Long Short-Term memory network which employs multiple regularization methods during training time to learn generalized weights. Testing on stringent interaction datasets composed of proteins not seen during training, RAPPPID outperforms state-of-the-art methods. Further experiments show that RAPPPID's performance holds regardless of the particular proteins in the testing set and its performance is higher for experimentally supported edges. This study serves to demonstrate that appropriate regularization is an important component of overcoming the challenges of creating models for PPI prediction that generalize to unseen proteins. Additionally, as part of this study, we provide datasets corresponding to several data splits of various strictness, in order to facilitate assessment of PPI reconstruction methods by others in the future. AVAILABILITY AND IMPLEMENTATION: Code and datasets are freely available at https://github.com/jszym/rapppid and Zenodo.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Proteínas/metabolismo , Comunicación Celular
8.
Bioinformatics ; 38(14): 3609-3620, 2022 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-35674359

RESUMEN

MOTIVATION: The increasing number of publicly available databases containing drugs' chemical structures, their response in cell lines, and molecular profiles of the cell lines has garnered attention to the problem of drug response prediction. However, many existing methods do not fully leverage the information that is shared among cell lines and drugs with similar structure. As such, drug similarities in terms of cell line responses and chemical structures could prove to be useful in forming drug representations to improve drug response prediction accuracy. RESULTS: We present two deep learning approaches, BiG-DRP and BiG-DRP+, for drug response prediction. Our models take advantage of the drugs' chemical structure and the underlying relationships of drugs and cell lines through a bipartite graph and a heterogeneous graph convolutional network that incorporate sensitive and resistant cell line information in forming drug representations. Evaluation of our methods and other state-of-the-art models in different scenarios shows that incorporating this bipartite graph significantly improves the prediction performance. In addition, genes that contribute significantly to the performance of our models also point to important biological processes and signaling pathways. Analysis of predicted drug response of patients' tumors using our model revealed important associations between mutations and drug sensitivity, illustrating the utility of our model in pharmacogenomics studies. AVAILABILITY AND IMPLEMENTATION: An implementation of the algorithms in Python is provided in https://github.com/ddhostallero/BiG-DRP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Fenómenos Biológicos , Humanos
9.
Sci Rep ; 11(1): 23928, 2021 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-34907210

RESUMEN

Identification of transcriptional regulatory mechanisms and signaling networks involved in the response of host cells to infection by SARS-CoV-2 is a powerful approach that provides a systems biology view of gene expression programs involved in COVID-19 and may enable the identification of novel therapeutic targets and strategies to mitigate the impact of this disease. In this study, our goal was to identify a transcriptional regulatory network that is associated with gene expression changes between samples infected by SARS-CoV-2 and those that are infected by other respiratory viruses to narrow the results on those enriched or specific to SARS-CoV-2. We combined a series of recently developed computational tools to identify transcriptional regulatory mechanisms involved in the response of epithelial cells to infection by SARS-CoV-2, and particularly regulatory mechanisms that are specific to this virus when compared to other viruses. In addition, using network-guided analyses, we identified kinases associated with this network. The results identified pathways associated with regulation of inflammation (MAPK14) and immunity (BTK, MBX) that may contribute to exacerbate organ damage linked with complications of COVID-19. The regulatory network identified herein reflects a combination of known hits and novel candidate pathways supporting the novel computational pipeline presented herein to quickly narrow down promising avenues of investigation when facing an emerging and novel disease such as COVID-19.


Asunto(s)
COVID-19/genética , Perfilación de la Expresión Génica/métodos , SARS-CoV-2/patogenicidad , Análisis de Secuencia de ARN/métodos , Células A549 , Línea Celular , Células Epiteliales/química , Células Epiteliales/citología , Células Epiteliales/virología , Regulación de la Expresión Génica , Humanos , Modelos Biológicos , Biología de Sistemas
10.
PLoS Comput Biol ; 17(3): e1008810, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33684134

RESUMEN

Abnormal coagulation and an increased risk of thrombosis are features of severe COVID-19, with parallels proposed with hemophagocytic lymphohistiocytosis (HLH), a life-threating condition associated with hyperinflammation. The presence of HLH was described in severely ill patients during the H1N1 influenza epidemic, presenting with pulmonary vascular thrombosis. We tested the hypothesis that genes causing primary HLH regulate pathways linking pulmonary thromboembolism to the presence of SARS-CoV-2 using novel network-informed computational algorithms. This approach led to the identification of Neutrophils Extracellular Traps (NETs) as plausible mediators of vascular thrombosis in severe COVID-19 in children and adults. Taken together, the network-informed analysis led us to propose the following model: the release of NETs in response to inflammatory signals acting in concert with SARS-CoV-2 damage the endothelium and direct platelet-activation promoting abnormal coagulation leading to serious complications of COVID-19. The underlying hypothesis is that genetic and/or environmental conditions that favor the release of NETs may predispose individuals to thrombotic complications of COVID-19 due to an increase risk of abnormal coagulation. This would be a common pathogenic mechanism in conditions including autoimmune/infectious diseases, hematologic and metabolic disorders.


Asunto(s)
COVID-19/complicaciones , COVID-19/genética , Trampas Extracelulares/genética , Linfohistiocitosis Hemofagocítica/complicaciones , Linfohistiocitosis Hemofagocítica/genética , Modelos Biológicos , SARS-CoV-2/genética , Trombosis/etiología , Trombosis/genética , Algoritmos , Degranulación de la Célula/genética , Biología Computacional , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Humanos , Pandemias , Mapas de Interacción de Proteínas , Embolia Pulmonar/etiología , Embolia Pulmonar/genética , Proteínas Virales/genética
11.
NPJ Syst Biol Appl ; 7(1): 9, 2021 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-33558504

RESUMEN

Reconstruction of transcriptional regulatory networks (TRNs) is a powerful approach to unravel the gene expression programs involved in healthy and disease states of a cell. However, these networks are usually reconstructed independent of the phenotypic (or clinical) properties of the samples. Therefore, they may confound regulatory mechanisms that are specifically related to a phenotypic property with more general mechanisms underlying the full complement of the analyzed samples. In this study, we develop a method called InPheRNo to identify "phenotype-relevant" TRNs. This method is based on a probabilistic graphical model that models the simultaneous effects of multiple transcription factors (TFs) on their target genes and the statistical relationship between the target genes' expression and the phenotype. Extensive comparison of InPheRNo with related approaches using primary tumor samples of 18 cancer types from The Cancer Genome Atlas reveals that InPheRNo can accurately reconstruct cancer type-relevant TRNs and identify cancer driver TFs. In addition, survival analysis reveals that the activity level of TFs with many target genes could distinguish patients with poor prognosis from those with better prognosis.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Elementos Reguladores de la Transcripción/genética , Algoritmos , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Humanos , Modelos Estadísticos , Neoplasias/genética , Fenotipo , Programas Informáticos , Biología de Sistemas/métodos , Factores de Transcripción/genética
12.
Sci Rep ; 10(1): 17682, 2020 10 19.
Artículo en Inglés | MEDLINE | ID: mdl-33077880

RESUMEN

The biological processes involved in a drug's mechanisms of action are oftentimes dynamic, complex and difficult to discern. Time-course gene expression data is a rich source of information that can be used to unravel these complex processes, identify biomarkers of drug sensitivity and predict the response to a drug. However, the majority of previous work has not fully utilized this temporal dimension. In these studies, the gene expression data is either considered at one time-point (before the administration of the drug) or two time-points (before and after the administration of the drug). This is clearly inadequate in modeling dynamic gene-drug interactions, especially for applications such as long-term drug therapy. In this work, we present a novel REcursive Prediction (REP) framework for drug response prediction by taking advantage of time-course gene expression data. Our goal is to predict drug response values at every stage of a long-term treatment, given the expression levels of genes collected in the previous time-points. To this end, REP employs a built-in recursive structure that exploits the intrinsic time-course nature of the data and integrates past values of drug responses for subsequent predictions. It also incorporates tensor completion that can not only alleviate the impact of noise and missing data, but also predict unseen gene expression levels (GEXs). These advantages enable REP to estimate drug response at any stage of a given treatment from some GEXs measured in the beginning of the treatment. Extensive experiments on two datasets corresponding to multiple sclerosis patients treated with interferon are included to showcase the effectiveness of REP.


Asunto(s)
Resistencia a Medicamentos/genética , Modelos Teóricos , Farmacología , Algoritmos , Biomarcadores/metabolismo , Expresión Génica , Humanos
13.
Vaccines (Basel) ; 8(3)2020 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-32764349

RESUMEN

Innate responses provide the first line of defense against viral infections, including the influenza virus at mucosal surfaces. Communication and interaction between different host cells at the early stage of viral infections determine the quality and magnitude of immune responses against the invading virus. The release of membrane-encapsulated extracellular vesicles (EVs), from host cells, is defined as a refined system of cell-to-cell communication. EVs contain a diverse array of biomolecules, including microRNAs (miRNAs). We hypothesized that the activation of the tracheal cells with different stimuli impacts the cellular and EV miRNA profiles. Chicken tracheal rings were stimulated with polyI:C and LPS from Escherichia coli 026:B6 or infected with low pathogenic avian influenza virus H4N6. Subsequently, miRNAs were isolated from chicken tracheal cells or from EVs released from chicken tracheal cells. Differentially expressed (DE) miRNAs were identified in treated groups when compared to the control group. Our results demonstrated that there were 67 up-regulated miRNAs, 157 down-regulated miRNAs across all cellular and EV samples. In the next step, several genes or pathways targeted by DE miRNAs were predicted. Overall, this study presented a global miRNA expression profile in chicken tracheas in response to avian influenza viruses (AIV) and toll-like receptor (TLR) ligands. The results presented predicted the possible roles of some DE miRNAs in the induction of antiviral responses. The DE candidate miRNAs, including miR-146a, miR-146b, miR-205a, miR-205b and miR-449, can be investigated further for functional validation studies and to be used as novel prophylactic and therapeutic targets in tailoring or enhancing antiviral responses against AIV.

14.
Breast Cancer Res ; 22(1): 74, 2020 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-32641077

RESUMEN

BACKGROUND: Cancer cells are known to display varying degrees of metastatic propensity, but the molecular basis underlying such heterogeneity remains unclear. Our aims in this study were to (i) elucidate prognostic subtypes in primary tumors based on an epithelial-to-mesenchymal-to-amoeboid transition (EMAT) continuum that captures the heterogeneity of metastatic propensity and (ii) to more comprehensively define biologically informed subtypes predictive of breast cancer metastasis and survival in lymph node-negative (LNN) patients. METHODS: We constructed a novel metastasis biology-based gene signature (EMAT) derived exclusively from cancer cells induced to undergo either epithelial-to-mesenchymal transition (EMT) or mesenchymal-to-amoeboid transition (MAT) to gauge their metastatic potential. Genome-wide gene expression data obtained from 913 primary tumors of lymph node-negative breast cancer (LNNBC) patients were analyzed. EMAT gene signature-based prognostic stratification of patients was performed to identify biologically relevant subtypes associated with distinct metastatic propensity. RESULTS: Delineated EMAT subtypes display a biologic range from less stem-like to more stem-like cell states and from less invasive to more invasive modes of cancer progression. Consideration of EMAT subtypes in combination with standard clinical parameters significantly improved survival prediction. EMAT subtypes outperformed prognosis accuracy of receptor or PAM50-based BC intrinsic subtypes even after adjusting for treatment variables in 3 independent, LNNBC cohorts including a treatment-naïve patient cohort. CONCLUSIONS: EMAT classification is a biologically informed method that provides prognostic information beyond that which can be provided by traditional cancer staging or PAM50 molecular subtype status and may improve metastasis risk assessment in early stage, LNNBC patients, who may otherwise be perceived to be at low metastasis risk.


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Transición Epitelial-Mesenquimal/genética , Biomarcadores de Tumor/genética , Neoplasias de la Mama/metabolismo , Femenino , Estudios de Seguimiento , Humanos , Persona de Mediana Edad , Metástasis de la Neoplasia , Pronóstico , Receptor ErbB-2/metabolismo , Receptores de Estrógenos/metabolismo , Receptores de Progesterona/metabolismo , Medición de Riesgo/métodos , Tasa de Supervivencia , Transcriptoma
15.
PLoS Comput Biol ; 16(1): e1007607, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31967990

RESUMEN

Prediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients. We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples' tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide better prediction performance. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs' mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.


Asunto(s)
Antineoplásicos , Modelos Estadísticos , Neoplasias , Transcriptoma/efectos de los fármacos , Algoritmos , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Biología Computacional/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Humanos , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/metabolismo , Neoplasias/patología , Medicina de Precisión
16.
PLoS Biol ; 18(1): e3000583, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31971940

RESUMEN

We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.


Asunto(s)
Algoritmos , Nube Computacional , Minería de Datos/métodos , Genómica/métodos , Programas Informáticos , Análisis por Conglomerados , Biología Computacional/métodos , Análisis de Datos , Conjuntos de Datos como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Conocimiento , Aprendizaje Automático , Metabolómica/métodos
17.
Sci Rep ; 8(1): 6620, 2018 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-29700343

RESUMEN

Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.


Asunto(s)
Redes Reguladoras de Genes , Genómica , Modelos Biológicos , Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Neoplasias/genética , Reproducibilidad de los Resultados
18.
Genome Biol ; 18(1): 153, 2017 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-28800781

RESUMEN

BACKGROUND: Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance. RESULTS: We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein-protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group. CONCLUSIONS: Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.


Asunto(s)
Biología Computacional/métodos , Resistencia a Antineoplásicos/genética , Estudios de Asociación Genética/métodos , Algoritmos , Antineoplásicos/farmacología , Biomarcadores de Tumor , Análisis por Conglomerados , Epistasis Genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Redes Reguladoras de Genes , Humanos , Fenotipo , Reproducibilidad de los Resultados
19.
Bioinformatics ; 32(24): 3717-3728, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27540270

RESUMEN

MOTIVATION: Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. An improved understanding of the generative mechanisms behind the mutation rules and their influence on gene community behavior is of great importance for the study of cancer. RESULTS: To expand our capability to analyze combinatorial patterns of cancer alterations, we developed a rigorous methodology for cancer mutation pattern discovery based on a new, constrained form of correlation clustering. Our new algorithm, named C3 (Cancer Correlation Clustering), leverages mutual exclusivity of mutations, patient coverage and driver network concentration principles. To test C3, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying biologically relevant driver genes. The proposed agnostic clustering method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies, and it may also be used for other clustering problems on biological graphs. AVAILABILITY AND IMPLEMENTATION: The source code for the C3 method can be found at https://github.com/jackhou2/C3 CONTACTS: jianma@cs.cmu.edu or milenkov@illinois.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Neoplasias de la Mama/genética , Análisis por Conglomerados , Biología Computacional/métodos , Análisis Mutacional de ADN/métodos , Glioblastoma/genética , Femenino , Redes Reguladoras de Genes , Humanos , Mutación
20.
PLoS One ; 9(3): e90781, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24622336

RESUMEN

We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse "scaffold networks", to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes , Escherichia coli/genética , Perfilación de la Expresión Génica , Células HeLa , Humanos , Saccharomyces cerevisiae/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...