Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Brief Bioinform ; 20(5): 1734-1753, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-31846027

RESUMO

Recent years have seen an increase in the availability of pharmacogenomic databases such as Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) that provide genomic and functional characterization information for multiple cell lines. Studies have alluded to the fact that specific characterizations may be inconsistent between different databases. Analysis of the potential discrepancies in the different databases is highly significant, as these sources are frequently used to analyze and validate methodologies for personalized cancer therapies. In this article, we review the recent developments in investigating the correspondence between different pharmacogenomics databases and discuss the potential factors that require attention when incorporating these sources in any modeling analysis. Furthermore, we explored the consistency among these databases using copulas that can capture nonlinear dependencies between two sets of data.


Assuntos
Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/genética , Farmacogenética , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , Neoplasias/patologia
2.
Bioinformatics ; 35(17): 3143-3145, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30649230

RESUMO

SUMMARY: Biological processes are characterized by a variety of different genomic feature sets. However, often times when building models, portions of these features are missing for a subset of the dataset. We provide a modeling framework to effectively integrate this type of heterogeneous data to improve prediction accuracy. To test our methodology, we have stacked data from the Cancer Cell Line Encyclopedia to increase the accuracy of drug sensitivity prediction. The package addresses the dynamic regime of information integration involving sequential addition of features and samples. AVAILABILITY AND IMPLEMENTATION: The framework has been implemented as a R package Sstack, which can be downloaded from https://cran.r-project.org/web/packages/Sstack/index.html, where further explanation of the package is available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Software , Linhagem Celular Tumoral , Genoma , Genômica , Humanos
3.
BMC Bioinformatics ; 20(Suppl 12): 317, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31216980

RESUMO

BACKGROUND: Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage. RESULTS: In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time responses with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals. CONCLUSION: We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharmacological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds.


Assuntos
Modelos Teóricos , Farmacologia , Simulação por Computador , Bases de Dados como Assunto , Relação Dose-Resposta a Droga , Humanos , Fatores de Tempo
4.
Bioinformatics ; 34(8): 1336-1344, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29267851

RESUMO

Motivation: Random forest (RF) has become a widely popular prediction generating mechanism. Its strength lies in its flexibility, interpretability and ability to handle large number of features, typically larger than the sample size. However, this methodology is of limited use if one wishes to identify statistically significant features. Several ranking schemes are available that provide information on the relative importance of the features, but there is a paucity of general inferential mechanism, particularly in a multi-variate set up. We use the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing. The resulting sequential algorithm offers an inferentially justifiable, but model-free, variable selection procedure. Significant features are then used to generate predictive RF. An added advantage of our methodology is that both variable selection and prediction are based on conditional inference framework and hence are coherent. Results: We illustrate the performance of our Sequential Multi-Response Feature Selection approach through simulation studies and finally apply this methodology on Genomics of Drug Sensitivity for Cancer dataset to identify genetic characteristics that significantly impact drug sensitivities. Significant set of predictors obtained from our method are further validated from biological perspective. Availability and implementation: https://github.com/jomayer/SMuRF. Contact: souparno.ghosh@ttu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genômica/métodos , Antineoplásicos/uso terapêutico , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética
5.
BMC Bioinformatics ; 19(Suppl 3): 71, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29589559

RESUMO

BACKGROUND: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types. RESULTS: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines. CONCLUSION: The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.


Assuntos
Ensaios de Seleção de Medicamentos Antitumorais , Modelos Biológicos , Algoritmos , Área Sob a Curva , Viés , Linhagem Celular Tumoral , Aprendizado Profundo , Humanos , Neoplasias/tratamento farmacológico , Medicina de Precisão
6.
BMC Bioinformatics ; 19(Suppl 17): 497, 2018 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-30591023

RESUMO

BACKGROUND: In precision medicine, scarcity of suitable biological data often hinders the design of an appropriate predictive model. In this regard, large scale pharmacogenomics studies, like CCLE and GDSC hold the promise to mitigate the issue. However, one cannot directly employ data from multiple sources together due to the existing distribution shift in data. One way to solve this problem is to utilize the transfer learning methodologies tailored to fit in this specific context. RESULTS: In this paper, we present two novel approaches for incorporating information from a secondary database for improving the prediction in a target database. The first approach is based on latent variable cost optimization and the second approach considers polynomial mapping between the two databases. Utilizing CCLE and GDSC databases, we illustrate that the proposed approaches accomplish a better prediction of drug sensitivities for different scenarios as compared to the existing approaches. CONCLUSION: We have compared the performance of the proposed predictive models with database-specific individual models as well as existing transfer learning approaches. We note that our proposed approaches exhibit superior performance compared to the abovementioned alternative techniques for predicting sensitivity for different anti-cancer compounds, particularly the nonlinear mapping model shows the best overall performance.


Assuntos
Algoritmos , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Área Sob a Curva , Bases de Dados Factuais , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética
7.
Bioinformatics ; 33(9): 1407-1410, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28334269

RESUMO

Summary: IntegratedMRF is an open-source R implementation for integrating drug response predictions from various genomic characterizations using univariate or multivariate random forests that includes various options for error estimation techniques. The integrated framework was developed following superior performance of random forest based methods in NCI-DREAM drug sensitivity prediction challenge. The computational framework can be applied to estimate mean and confidence interval of drug response prediction errors based on ensemble approaches with various combinations of genetic and epigenetic characterizations as inputs. The multivariate random forest implementation included in the package incorporates the correlations between output responses in the modeling and has been shown to perform better than existing approaches when the drug responses are correlated. Detailed analysis of the provided features is included in the Supplementary Material . Availability and Implementation: The framework has been implemented as a package IntegratedMRF , which can be downloaded from https://cran.r-project.org/web/packages/IntegratedMRF/index.html , where further explanation of the package is available. Contact: ranadip.pal@ttu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biomarcadores Farmacológicos , Genômica/métodos , Modelos Genéticos , Neoplasias/genética , Software , Estatística como Assunto/métodos , Antineoplásicos/uso terapêutico , Metilação de DNA , Regulação da Expressão Gênica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Transcriptoma
8.
Nat Commun ; 11(1): 4391, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32873806

RESUMO

Deep learning with Convolutional Neural Networks has shown great promise in image-based classification and enhancement but is often unsuitable for predictive modeling using features without spatial correlations. We present a feature representation approach termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to arrange high-dimensional vectors in a compact image form conducible for CNN-based deep learning. We consider the similarities between features to generate a concise feature map in the form of a two-dimensional image by minimizing the pairwise distance values following a Bayesian Metric Multidimensional Scaling Approach. We hypothesize that this approach enables embedded feature extraction and, integrated with CNN-based deep learning, can boost the predictive accuracy. We illustrate the superior predictive capabilities of the proposed framework as compared to state-of-the-art methodologies in drug sensitivity prediction scenarios using synthetic datasets, drug chemical descriptors as predictors from NCI60, and both transcriptomic information and drug descriptors as predictors from GDSC.


Assuntos
Antineoplásicos/farmacologia , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/tratamento farmacológico , Antineoplásicos/uso terapêutico , Teorema de Bayes , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Conjuntos de Dados como Assunto , Resistencia a Medicamentos Antineoplásicos , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos
9.
Methods Mol Biol ; 1878: 227-241, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30378080

RESUMO

Accurately predicting sensitivity of tumor cells to anti-cancer drugs based on genetic characterizations is a significant challenge for personalized cancer therapy. This chapter provides a computational procedure to design predictive models from individual genomic characterizations and combine them to arrive at an integrated predictive model. Integrated modeling employs the complementary information from heterogeneous genetic characterizations to improve the prediction error as well as lowering the error confidence interval.


Assuntos
Antineoplásicos/farmacologia , Neoplasias/tratamento farmacológico , Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genética , Linhagem Celular Tumoral , Genômica/métodos , Humanos
10.
Sci Rep ; 9(1): 1628, 2019 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-30733524

RESUMO

Drug sensitivity prediction for individual tumors is a significant challenge in personalized medicine. Current modeling approaches consider prediction of a single metric of the drug response curve such as AUC or IC50. However, the single summary metric of a dose-response curve fails to provide the entire drug sensitivity profile which can be used to design the optimal dose for a patient. In this article, we assess the problem of predicting the complete dose-response curve based on genetic characterizations. We propose an enhancement to the popular ensemble-based Random Forests approach that can directly predict the entire functional profile of a dose-response curve rather than a single summary metric. We design functional regression trees with node costs modified based on dose/response region dependence methodologies and response distribution based approaches. Our results relative to large pharmacological databases such as CCLE and GDSC show a higher accuracy in predicting dose-response curves of the proposed functional framework in contrast to univariate or multivariate Random Forest predicting sensitivities at different dose levels. Furthermore, we also considered the problem of predicting functional responses from functional predictors i.e., estimating the dose-response curves with a model built on dose-dependent expression data. The superior performance of Functional Random Forest using functional data as compared to existing approaches have been shown using the HMS-LINCS dataset. In summary, Functional Random Forest presents an enhanced predictive modeling framework to predict the entire functional response profile considering both static and functional predictors instead of predicting the summary metrics of the response curves.


Assuntos
Relação Dose-Resposta a Droga , Modelos Teóricos , Área Sob a Curva , Linhagem Celular , Bases de Dados de Produtos Farmacêuticos , Humanos , Análise Multivariada , Neoplasias/tratamento farmacológico , Neoplasias/genética , Análise de Regressão , Reprodutibilidade dos Testes
11.
Sci Rep ; 7(1): 11347, 2017 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-28900181

RESUMO

Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.


Assuntos
Resistência a Medicamentos , Modelos Estatísticos , Algoritmos , Bases de Dados Factuais , Humanos , Neoplasias/tratamento farmacológico , Reprodutibilidade dos Testes
12.
PLoS One ; 10(12): e0144490, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26658256

RESUMO

Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.


Assuntos
Algoritmos , Simulação por Computador , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Neoplasias/tratamento farmacológico , Genômica/métodos , Humanos , Análise Multivariada , Neoplasias/genética , Medicina de Precisão/métodos , Análise de Regressão , Resultado do Tratamento
13.
Cancer Inform ; 14(Suppl 5): 57-73, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-27081304

RESUMO

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees' prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA