Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Cancers (Basel) ; 16(3)2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38339281

RESUMO

It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.

2.
Sci Rep ; 13(1): 22091, 2023 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-38086905

RESUMO

Chronic kidney disease (CKD) is a progressive loss in kidney function. Early detection of patients who will progress to late-stage CKD is of paramount importance for patient care. To address this, we develop a pipeline to process longitudinal electronic heath records (EHRs) and construct recurrent neural network (RNN) models to predict CKD progression from stages II/III to stages IV/V. The RNN model generates predictions based on time-series records of patients, including repeated lab tests and other clinical variables. Our investigation reveals that using a single variable, the recorded estimated glomerular filtration rate (eGFR) over time, the RNN model achieves an average area under the receiver operating characteristic curve (AUROC) of 0.957 for predicting future CKD progression. When additional clinical variables, such as demographics, vital information, lab test results, and health behaviors, are incorporated, the average AUROC increases to 0.967. In both scenarios, the standard deviation of the AUROC across cross-validation trials is less than 0.01, indicating a stable and high prediction accuracy. Our analysis results demonstrate the proposed RNN model outperforms existing standard approaches, including static and dynamic Cox proportional hazards models, random forest, and LightGBM. The utilization of the RNN model and the time-series data of previous eGFR measurements underscores its potential as a straightforward and effective tool for assessing the clinical risk of CKD patients concerning their disease progression.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Renal Crônica , Humanos , Insuficiência Renal Crônica/diagnóstico , Taxa de Filtração Glomerular , Redes Neurais de Computação , Fatores de Tempo , Progressão da Doença
3.
Front Med (Lausanne) ; 10: 1086097, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36873878

RESUMO

Cancer claims millions of lives yearly worldwide. While many therapies have been made available in recent years, by in large cancer remains unsolved. Exploiting computational predictive models to study and treat cancer holds great promise in improving drug development and personalized design of treatment plans, ultimately suppressing tumors, alleviating suffering, and prolonging lives of patients. A wave of recent papers demonstrates promising results in predicting cancer response to drug treatments while utilizing deep learning methods. These papers investigate diverse data representations, neural network architectures, learning methodologies, and evaluations schemes. However, deciphering promising predominant and emerging trends is difficult due to the variety of explored methods and lack of standardized framework for comparing drug response prediction models. To obtain a comprehensive landscape of deep learning methods, we conducted an extensive search and analysis of deep learning models that predict the response to single drug treatments. A total of 61 deep learning-based models have been curated, and summary plots were generated. Based on the analysis, observable patterns and prevalence of methods have been revealed. This review allows to better understand the current state of the field and identify major challenges and promising solution paths.

4.
Front Med (Lausanne) ; 10: 1058919, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36960342

RESUMO

Patient-derived xenografts (PDXs) are an appealing platform for preclinical drug studies. A primary challenge in modeling drug response prediction (DRP) with PDXs and neural networks (NNs) is the limited number of drug response samples. We investigate multimodal neural network (MM-Net) and data augmentation for DRP in PDXs. The MM-Net learns to predict response using drug descriptors, gene expressions (GE), and histology whole-slide images (WSIs). We explore whether combining WSIs with GE improves predictions as compared with models that use GE alone. We propose two data augmentation methods which allow us training multimodal and unimodal NNs without changing architectures with a single larger dataset: 1) combine single-drug and drug-pair treatments by homogenizing drug representations, and 2) augment drug-pairs which doubles the sample size of all drug-pair samples. Unimodal NNs which use GE are compared to assess the contribution of data augmentation. The NN that uses the original and the augmented drug-pair treatments as well as single-drug treatments outperforms NNs that ignore either the augmented drug-pairs or the single-drug treatments. In assessing the multimodal learning based on the MCC metric, MM-Net outperforms all the baselines. Our results show that data augmentation and integration of histology images with GE can improve prediction performance of drug response in PDXs.

5.
Cancers (Basel) ; 16(1)2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38201477

RESUMO

Cancer is a heterogeneous disease in that tumors of the same histology type can respond differently to a treatment. Anti-cancer drug response prediction is of paramount importance for both drug development and patient treatment design. Although various computational methods and data have been used to develop drug response prediction models, it remains a challenging problem due to the complexities of cancer mechanisms and cancer-drug interactions. To better characterize the interaction between cancer and drugs, we investigate the feasibility of integrating computationally derived features of molecular mechanisms of action into prediction models. Specifically, we add docking scores of drug molecules and target proteins in combination with cancer gene expressions and molecular drug descriptors for building response models. The results demonstrate a marginal improvement in drug response prediction performance when adding docking scores as additional features, through tests on large drug screening data. We discuss the limitations of the current approach and provide the research community with a baseline dataset of the large-scale computational docking for anti-cancer drugs.

6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34524425

RESUMO

To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.


Assuntos
Neoplasias , Algoritmos , Linhagem Celular , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação
8.
Sci Rep ; 11(1): 11325, 2021 05 31.
Artigo em Inglês | MEDLINE | ID: mdl-34059739

RESUMO

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Software , Linhagem Celular Tumoral , Humanos
9.
BMC Bioinformatics ; 22(1): 252, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-34001007

RESUMO

BACKGROUND: Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. METHODS: We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. RESULTS: The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. CONCLUSIONS: A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.


Assuntos
Neoplasias , Preparações Farmacêuticas , Linhagem Celular , Curva de Aprendizado , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética , Estudos Prospectivos
10.
Sci Rep ; 10(1): 18040, 2020 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-33093487

RESUMO

Transfer learning, which transfers patterns learned on a source dataset to a related target dataset for constructing prediction models, has been shown effective in many applications. In this paper, we investigate whether transfer learning can be used to improve the performance of anti-cancer drug response prediction models. Previous transfer learning studies for drug response prediction focused on building models to predict the response of tumor cells to a specific drug treatment. We target the more challenging task of building general prediction models that can make predictions for both new tumor cells and new drugs. Uniquely, we investigate the power of transfer learning for three drug response prediction applications including drug repurposing, precision oncology, and new drug development, through different data partition schemes in cross-validation. We extend the classic transfer learning framework through ensemble and demonstrate its general utility with three representative prediction algorithms including a gradient boosting model and two deep neural networks. The ensemble transfer learning framework is tested on benchmark in vitro drug screening datasets. The results demonstrate that our framework broadly improves the prediction performance in all three drug response prediction applications with all three prediction algorithms.


Assuntos
Antineoplásicos/farmacologia , Conjuntos de Dados como Assunto , Aprendizado Profundo , Ensaios de Seleção de Medicamentos Antitumorais , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Algoritmos , Antineoplásicos/uso terapêutico , Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Humanos , Modelos Biológicos , Redes Neurais de Computação , Medicina de Precisão
11.
J Comput Graph Stat ; 29(1): 53-65, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32982129

RESUMO

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.

12.
Genes (Basel) ; 11(9)2020 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-32933072

RESUMO

The co-expression extrapolation (COXEN) method has been successfully used in multiple studies to select genes for predicting the response of tumor cells to a specific drug treatment. Here, we enhance the COXEN method to select genes that are predictive of the efficacies of multiple drugs for building general drug response prediction models that are not specific to a particular drug. The enhanced COXEN method first ranks the genes according to their prediction power for each individual drug and then takes a union of top predictive genes of all the drugs, among which the algorithm further selects genes whose co-expression patterns are well preserved between cancer cases for building prediction models. We apply the proposed method on benchmark in vitro drug screening datasets and compare the performance of prediction models built based on the genes selected by the enhanced COXEN method to that of models built on genes selected by the original COXEN method and randomly picked genes. Models built with the enhanced COXEN method always present a statistically significantly improved prediction performance (adjusted p-value ≤ 0.05). Our results demonstrate the enhanced COXEN method can dramatically increase the power of gene expression data for predicting drug response.


Assuntos
Antineoplásicos/farmacologia , Biomarcadores Tumorais/genética , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Algoritmos , Humanos
13.
Cancer Imaging ; 19(1): 48, 2019 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-31307537

RESUMO

BACKGROUND: Imaging techniques can provide information about the tumor non-invasively and have been shown to provide information about the underlying genetic makeup. Correlating image-based phenotypes (radiomics) with genomic analyses is an emerging area of research commonly referred to as "radiogenomics" or "imaging-genomics". The purpose of this study was to assess the potential for using an automated, quantitative radiomics platform on magnetic resonance (MR) breast imaging for inferring underlying activity of clinically relevant gene pathways derived from RNA sequencing of invasive breast cancers prior to therapy. METHODS: We performed quantitative radiomic analysis on 47 invasive breast cancers based on dynamic contrast enhanced 3 Tesla MR images acquired before surgery and obtained gene expression data by performing total RNA sequencing on corresponding fresh frozen tissue samples. We used gene set enrichment analysis to identify significant associations between the 186 gene pathways and the 38 image-based features that have previously been validated. RESULTS: All radiomic size features were positively associated with multiple replication and proliferation pathways and were negatively associated with the apoptosis pathway. Gene pathways related to immune system regulation and extracellular signaling had the highest number of significant radiomic feature associations, with an average of 18.9 and 16 features per pathway, respectively. Tumors with upregulation of immune signaling pathways such as T-cell receptor signaling and chemokine signaling as well as extracellular signaling pathways such as cell adhesion molecule and cytokine-cytokine interactions were smaller, more spherical, and had a more heterogeneous texture upon contrast enhancement. Tumors with higher expression levels of JAK/STAT and VEGF pathways had more intratumor heterogeneity in image enhancement texture. Other pathways with robust associations to image-based features include metabolic and catabolic pathways. CONCLUSIONS: We provide further evidence that MR imaging of breast tumors can infer underlying gene expression by using RNA sequencing. Size and shape features were appropriately correlated with proliferative and apoptotic pathways. Given the high number of radiomic feature associations with immune pathways, our results raise the possibility of using MR imaging to distinguish tumors that are more immunologically active, although further studies are necessary to confirm this observation.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Imageamento por Ressonância Magnética/métodos , Idoso , Apoptose , Neoplasias da Mama/genética , Feminino , Humanos , Fenótipo
14.
Hum Gene Ther ; 30(9): 1117-1132, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31126191

RESUMO

In an effort to develop a new therapy for cancer and to improve antiprogrammed death inhibitor-1 (anti-PD-1) and anticytotoxic T lymphocyte-associated protein (anti-CTLA-4) responses, we have created a telomerase reverse transcriptase promoter-regulated oncolytic adenovirus rAd.sT containing a soluble transforming growth factor receptor II fused with human IgG Fc fragment (sTGFßRIIFc) gene. Infection of breast and renal tumor cells with rAd.sT produced sTGFßRIIFc protein with dose-dependent cytotoxicity. In immunocompetent mouse 4T1 breast tumor model, intratumoral delivery of rAd.sT inhibited both tumor growth and lung metastases. rAd.sT downregulated the expression of several transforming growth factor ß (TGFß) target genes involved in tumor growth and metastases, inhibited Th2 cytokine expression, and induced Th1 cytokines and chemokines, and granzyme B and perforin expression. rAd.sT treatment also increased the percentage of CD8+ T lymphocytes, promoted the generation of CD4+ T memory cells, reduced regulatory T lymphocytes (Tregs), and reduced bone marrow-derived suppressor cells. Importantly, rAd.sT treatment increased the percentage of CD4+ T lymphocytes, and promoted differentiation and maturation of antigen-presenting dendritic cells in the spleen. In the immunocompetent mouse Renca renal tumor model, similar therapeutic effects and immune activation results were observed. In the 4T1 mammary tumor model, rAd.sT improved the inhibition of tumor growth and lung and liver metastases by anti-PD-1 and anti-CTLA-4 antibodies. Analysis of the human breast and kidney tumors showed that a significant number of tumor tissues expressed high levels of TGFß and TGFß-inducible genes. Therefore, rAd.sT could be a potential enhancer of anti-PD-1 and anti-CTLA-4 therapy for treating breast and kidney cancers.


Assuntos
Adenoviridae/genética , Vetores Genéticos/genética , Imunidade , Terapia Viral Oncolítica , Vírus Oncolíticos/genética , Fator de Crescimento Transformador beta/genética , Animais , Antineoplásicos Imunológicos/farmacologia , Antígeno CTLA-4/antagonistas & inibidores , Linhagem Celular Tumoral , Terapia Combinada , Citocinas/metabolismo , Modelos Animais de Doenças , Técnicas de Transferência de Genes , Humanos , Imunomodulação , Camundongos , Neoplasias/genética , Neoplasias/imunologia , Neoplasias/metabolismo , Neoplasias/terapia , Receptor de Morte Celular Programada 1/antagonistas & inibidores , Transdução de Sinais , Subpopulações de Linfócitos T/imunologia , Subpopulações de Linfócitos T/metabolismo , Transdução Genética , Fator de Crescimento Transformador beta/metabolismo , Replicação Viral , Ensaios Antitumorais Modelo de Xenoenxerto
15.
JCO Clin Cancer Inform ; 3: 1-9, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30730765

RESUMO

PURPOSE: Recent data suggest that imaging radiomic features of a tumor could be indicative of important genomic biomarkers. Understanding the relationship between radiomic and genomic features is important for basic cancer research and future patient care. We performed a comprehensive study to discover the imaginggenomic associations in head and neck squamous cell carcinoma (HNSCC) and explore the potential of predicting tumor genomic alternations using radiomic features. METHODS: Our retrospective study integrated whole-genome multiomics data from The Cancer Genome Atlas with matched computed tomography imaging data from The Cancer Imaging Archive for the same set of 126 patients with HNSCC. Linear regression and gene set enrichment analysis were used to identify statistically significant associations between radiomic imaging and genomic features. Random forest classifier was used to predict the status of two key HNSCC molecular biomarkers, human papillomavirus and disruptive TP53 mutation, on the basis of radiomic features. RESULTS: Widespread and statistically significant associations were discovered between genomic features (including microRNA expression, somatic mutations, and transcriptional activity, copy number variations, and promoter region DNA methylation changes of pathways) and radiomic features characterizing the size, shape, and texture of tumor. Prediction of human papillomavirus and TP53 mutation status using radiomic features achieved areas under the receiver operating characteristic curve of 0.71 and 0.641, respectively. CONCLUSION: Our exploratory study suggests that radiomic features are associated with genomic characteristics at multiple molecular layers in HNSCC and provides justification for continued development of radiomics as biomarkers for relevant genomic alterations in HNSCC.


Assuntos
Biomarcadores Tumorais , Diagnóstico por Imagem , Predisposição Genética para Doença , Genômica , Processamento de Imagem Assistida por Computador , Carcinoma de Células Escamosas de Cabeça e Pescoço/diagnóstico por imagem , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Idoso , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Feminino , Perfilação da Expressão Gênica , Genômica/métodos , Humanos , Interpretação de Imagem Assistida por Computador , Masculino , Pessoa de Meia-Idade , Mutação , Estadiamento de Neoplasias , Reprodutibilidade dos Testes , Estudos Retrospectivos , Carcinoma de Células Escamosas de Cabeça e Pescoço/patologia , Tomografia Computadorizada por Raios X , Fluxo de Trabalho
16.
Biometrics ; 74(2): 584-594, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-28960246

RESUMO

We introduce a marginal version of the nested Dirichlet process to cluster distributions or histograms. We apply the model to cluster genes by patterns of gene-gene interaction. The proposed approach is based on the nested partition that is implied in the original construction of the nested Dirichlet process. It allows simulation exact inference, as opposed to a truncated Dirichlet process approximation. More importantly, the construction highlights the nature of the nested Dirichlet process as a nested partition of experimental units. We apply the proposed model to inference on clustering genes related to DNA mismatch repair (DMR) by the distribution of gene-gene interactions with other genes. Gene-gene interactions are recorded as coefficients in an auto-logistic model for the co-expression of two genes, adjusting for copy number variation, methylation and protein activation. These coefficients are extracted from an online database, called Zodiac, computed based on The Cancer Genome Atlas (TCGA) data. We compare results with a variation of k-means clustering that is set up to cluster distributions, truncated NDP and a hierarchical clustering method. The proposed inference shows favorable performance, under simulated conditions and also in the real data sets.


Assuntos
Análise por Conglomerados , Distribuições Estatísticas , Animais , Reparo de Erro de Pareamento de DNA/genética , Epistasia Genética , Perfilação da Expressão Gênica , Genes Neoplásicos , Humanos
17.
Bioinformatics ; 34(9): 1615-1617, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29272348

RESUMO

Motivation: The Cancer Genome Atlas (TCGA) program has produced huge amounts of cancer genomics data providing unprecedented opportunities for research. In 2014, we developed TCGA-Assembler, a software pipeline for retrieval and processing of public TCGA data. In 2016, TCGA data were transferred from the TCGA data portal to the Genomic Data Commons (GDCs), which is supported by a different set of data storage and retrieval mechanisms. In addition, new proteomics data of TCGA samples have been generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, which were not available for downloading through TCGA-Assembler. It is desirable to acquire and integrate data from both GDC and CPTAC. Results: We develop TCGA-assembler 2 (TA2) to automatically download and integrate data from GDC and CPTAC. We make substantial improvement on the functionality of TA2 to enhance user experience and software performance. TA2 together with its previous version have helped more than 2000 researchers from 64 countries to access and utilize TCGA and CPTAC data in their research. Availability of TA2 will continue to allow existing and new users to conduct reproducible research based on TCGA and CPTAC data. Availability and implementation: http://www.compgenome.org/TCGA-Assembler/ or https://github.com/compgenome365/TCGA-Assembler-2. Contact: zhuyitan@gmail.com or koaeraser@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Genoma , Genômica , Armazenamento e Recuperação da Informação , Neoplasias , Proteômica
18.
Biometrics ; 74(2): 606-615, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29023632

RESUMO

We develop novel hierarchical reciprocal graphical models to infer gene networks from heterogeneous data. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated networks. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. We illustrate the proposed approach by simulation studies and three applications with multiplatform genomic data for multiple cancers.


Assuntos
Biometria/métodos , Análise por Conglomerados , Redes Reguladoras de Genes , Simulação por Computador , Genômica , Humanos , Neoplasias/genética
19.
Sci Rep ; 6: 38350, 2016 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-27922124

RESUMO

Blind Source Separation (BSS) is a powerful tool for analyzing composite data patterns in many areas, such as computational biology. We introduce a novel BSS method, Convex Analysis of Mixtures (CAM), for separating non-negative well-grounded sources, which learns the mixing matrix by identifying the lateral edges of the convex data scatter plot. We propose and prove a sufficient and necessary condition for identifying the mixing matrix through edge detection in the noise-free case, which enables CAM to identify the mixing matrix not only in the exact-determined and over-determined scenarios, but also in the under-determined scenario. We show the optimality of the edge detection strategy, even for cases where source well-groundedness is not strictly satisfied. The CAM algorithm integrates plug-in noise filtering using sector-based clustering, an efficient geometric convex analysis scheme, and stability-based model order selection. The superior performance of CAM against a panel of benchmark BSS techniques is demonstrated on numerically mixed gene expression data of ovarian cancer subtypes. We apply CAM to dissect dynamic contrast-enhanced magnetic resonance imaging data taken from breast tumors and time-course microarray gene expression data derived from in-vivo muscle regeneration in mice, both producing biologically plausible decomposition results.


Assuntos
Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Perfilação da Expressão Gênica/estatística & dados numéricos , Processamento de Imagem Assistida por Computador/estatística & dados numéricos , Proteínas Musculares/genética , Neovascularização Patológica/diagnóstico por imagem , Animais , Neoplasias da Mama/irrigação sanguínea , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Análise por Conglomerados , Simulação por Computador , Conjuntos de Dados como Assunto , Feminino , Expressão Gênica , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Camundongos , Proteínas Musculares/metabolismo , Músculo Esquelético/lesões , Músculo Esquelético/metabolismo , Neovascularização Patológica/genética , Neovascularização Patológica/patologia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Regeneração/genética
20.
Artigo em Inglês | MEDLINE | ID: mdl-27853751

RESUMO

Using quantitative radiomics, we demonstrate that computer-extracted magnetic resonance (MR) image-based tumor phenotypes can be predictive of the molecular classification of invasive breast cancers. Radiomics analysis was performed on 91 MRIs of biopsy-proven invasive breast cancers from National Cancer Institute's multi-institutional TCGA/TCIA. Immunohistochemistry molecular classification was performed including estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and for 84 cases, the molecular subtype (normal-like, luminal A, luminal B, HER2-enriched, and basal-like). Computerized quantitative image analysis included: three-dimensional lesion segmentation, phenotype extraction, and leave-one-case-out cross validation involving stepwise feature selection and linear discriminant analysis. The performance of the classifier model for molecular subtyping was evaluated using receiver operating characteristic analysis. The computer-extracted tumor phenotypes were able to distinguish between molecular prognostic indicators; area under the ROC curve values of 0.89, 0.69, 0.65, and 0.67 in the tasks of distinguishing between ER+ versus ER-, PR+ versus PR-, HER2+ versus HER2-, and triple-negative versus others, respectively. Statistically significant associations between tumor phenotypes and receptor status were observed. More aggressive cancers are likely to be larger in size with more heterogeneity in their contrast enhancement. Even after controlling for tumor size, a statistically significant trend was observed within each size group (P = 0.04 for lesions ≤ 2 cm; P = 0.02 for lesions >2 to ≤5 cm) as with the entire data set (P-value = 0.006) for the relationship between enhancement texture (entropy) and molecular subtypes (normal-like, luminal A, luminal B, HER2-enriched, basal-like). In conclusion, computer-extracted image phenotypes show promise for high-throughput discrimination of breast cancer subtypes and may yield a quantitative predictive signature for advancing precision medicine.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...