Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Nat Commun ; 15(1): 5072, 2024 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-38871711

RESUMO

Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.


Assuntos
Aprendizado Profundo , Relação Quantitativa Estrutura-Atividade , Humanos , Descoberta de Drogas/métodos , Análise de Regressão , Algoritmos
2.
Sensors (Basel) ; 24(7)2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38610383

RESUMO

Unmanned aerial vehicle (UAV)-based imagery has become widely used to collect time-series agronomic data, which are then incorporated into plant breeding programs to enhance crop improvements. To make efficient analysis possible, in this study, by leveraging an aerial photography dataset for a field trial of 233 different inbred lines from the maize diversity panel, we developed machine learning methods for obtaining automated tassel counts at the plot level. We employed both an object-based counting-by-detection (CBD) approach and a density-based counting-by-regression (CBR) approach. Using an image segmentation method that removes most of the pixels not associated with the plant tassels, the results showed a dramatic improvement in the accuracy of object-based (CBD) detection, with the cross-validation prediction accuracy (r2) peaking at 0.7033 on a detector trained with images with a filter threshold of 90. The CBR approach showed the greatest accuracy when using unfiltered images, with a mean absolute error (MAE) of 7.99. However, when using bootstrapping, images filtered at a threshold of 90 showed a slightly better MAE (8.65) than the unfiltered images (8.90). These methods will allow for accurate estimates of flowering-related traits and help to make breeding decisions for crop improvement.


Assuntos
Inflorescência , Zea mays , Melhoramento Vegetal , Algoritmos , Aprendizado de Máquina
3.
Bioinform Adv ; 3(1): vbad036, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37033467

RESUMO

Summary: Predictive learning from medical data incurs additional challenge due to concerns over privacy and security of personal data. Federated learning, intentionally structured to preserve high level of privacy, is emerging to be an attractive way to generate cross-silo predictions in medical scenarios. However, the impact of severe population-level heterogeneity on federated learners is not well explored. In this article, we propose a methodology to detect presence of population heterogeneity in federated settings and propose a solution to handle such heterogeneity by developing a federated version of Deep Regression Forests. Additionally, we demonstrate that the recently conceptualized REpresentation of Features as Images with NEighborhood Dependencies CNN framework can be combined with the proposed Federated Deep Regression Forests to provide improved performance as compared to existing approaches. Availability and implementation: The Python source code for reproducing the main results are available on GitHub: https://github.com/DanielNolte/FederatedDeepRegressionForests. Contact: ranadip.pal@ttu.edu. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

4.
Free Radic Biol Med ; 191: 241-248, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36084790

RESUMO

Wide field measurements of nitric oxide (NO) signaling could help understand and diagnose the many physiological processes in which NO plays a key role. Magnetic resonance imaging (MRI) can support particularly powerful approaches for this purpose if equipped with molecular probes sensitized to NO and NO-associated targets. In this review, we discuss the development of MRI-detectable probes that could enable studies of nitrergic signaling in animals and potentially human subjects. Major families of probes include contrast agents designed to capture and report integrated NO levels directly, as well as molecules that respond to or emulate the activity of nitric oxide synthase enzymes. For each group, we outline the relevant molecular mechanisms and discuss results that have been obtained in vitro and in animals. The most promising in vivo data described to date have been acquired using NO capture-based relaxation agents and using engineered nitric oxide synthases that provide hemodynamic readouts of NO signaling pathway activation. These advances establish a beachhead for ongoing efforts to improve the sensitivity, specificity, and clinical applicability of NO-related molecular MRI technology.


Assuntos
Meios de Contraste , Óxido Nítrico , Animais , Humanos , Imageamento por Ressonância Magnética/métodos , Sondas Moleculares , Óxido Nítrico/metabolismo , Óxido Nítrico Sintase/genética , Óxido Nítrico Sintase/metabolismo
5.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35437577

RESUMO

Predicting protein properties from amino acid sequences is an important problem in biology and pharmacology. Protein-protein interactions among SARS-CoV-2 spike protein, human receptors and antibodies are key determinants of the potency of this virus and its ability to evade the human immune response. As a rapidly evolving virus, SARS-CoV-2 has already developed into many variants with considerable variation in virulence among these variants. Utilizing the proteomic data of SARS-CoV-2 to predict its viral characteristics will, therefore, greatly aid in disease control and prevention. In this paper, we review and compare recent successful prediction methods based on long short-term memory (LSTM), transformer, convolutional neural network (CNN) and a similarity-based topological regression (TR) model and offer recommendations about appropriate predictive methodology depending on the similarity between training and test datasets. We compare the effectiveness of these models in predicting the binding affinity and expression of SARS-CoV-2 spike protein sequences. We also explore how effective these predictive methods are when trained on laboratory-created data and are tasked with predicting the binding affinity of the in-the-wild SARS-CoV-2 spike protein sequences obtained from the GISAID datasets. We observe that TR is a better method when the sample size is small and test protein sequences are sufficiently similar to the training sequence. However, when the training sample size is sufficiently large and prediction requires extrapolation, LSTM embedding and CNN-based predictive model show superior performance.


Assuntos
COVID-19 , SARS-CoV-2 , Sequência de Aminoácidos , COVID-19/genética , Humanos , Ligação Proteica , Proteômica , SARS-CoV-2/genética , Análise de Sequência de Proteína , Glicoproteína da Espícula de Coronavírus/metabolismo
6.
Nat Neurosci ; 25(3): 390-398, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35241803

RESUMO

The complex connectivity of the mammalian brain underlies its function, but understanding how interconnected brain regions interact in neural processing remains a formidable challenge. Here we address this problem by introducing a genetic probe that permits selective functional imaging of distributed neural populations defined by viral labeling techniques. The probe is an engineered enzyme that transduces cytosolic calcium dynamics of probe-expressing cells into localized hemodynamic responses that can be specifically visualized by functional magnetic resonance imaging. Using a viral vector that undergoes retrograde transport, we apply the probe to characterize a brain-wide network of presynaptic inputs to the striatum activated in a deep brain stimulation paradigm in rats. The results reveal engagement of surprisingly diverse projection sources and inform an integrated model of striatal function relevant to reward behavior and therapeutic neurostimulation approaches. Our work thus establishes a strategy for mechanistic analysis of multiregional neural systems in the mammalian brain.


Assuntos
Mapeamento Encefálico , Imageamento por Ressonância Magnética , Animais , Encéfalo/fisiologia , Corpo Estriado , Imageamento por Ressonância Magnética/métodos , Mamíferos , Ratos , Recompensa
7.
Bioinformatics ; 37(Suppl_1): i42-i50, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252971

RESUMO

MOTIVATION: Anti-cancer drug sensitivity prediction using deep learning models for individual cell line is a significant challenge in personalized medicine. Recently developed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) CNN (Convolutional Neural Network)-based models have shown promising results in improving drug sensitivity prediction. The primary idea behind REFINED-CNN is representing high dimensional vectors as compact images with spatial correlations that can benefit from CNN architectures. However, the mapping from a high dimensional vector to a compact 2D image depends on the a priori choice of the distance metric and projection scheme with limited empirical procedures guiding these choices. RESULTS: In this article, we consider an ensemble of REFINED-CNN built under different choices of distance metrics and/or projection schemes that can improve upon a single projection based REFINED-CNN model. Results, illustrated using NCI60 and NCI-ALMANAC databases, demonstrate that the ensemble approaches can provide significant improvement in prediction performance as compared to individual models. We also develop the theoretical framework for combining different distance metrics to arrive at a single 2D mapping. Results demonstrated that distance-averaged REFINED-CNN produced comparable performance as obtained from stacking REFINED-CNN ensemble but with significantly lower computational cost. AVAILABILITY AND IMPLEMENTATION: The source code, scripts, and data used in the paper have been deposited in GitHub (https://github.com/omidbazgirTTU/IntegratedREFINED). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos , Neoplasias , Humanos , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Redes Neurais de Computação , Software
8.
Nat Commun ; 11(1): 4391, 2020 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-32873806

RESUMO

Deep learning with Convolutional Neural Networks has shown great promise in image-based classification and enhancement but is often unsuitable for predictive modeling using features without spatial correlations. We present a feature representation approach termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to arrange high-dimensional vectors in a compact image form conducible for CNN-based deep learning. We consider the similarities between features to generate a concise feature map in the form of a two-dimensional image by minimizing the pairwise distance values following a Bayesian Metric Multidimensional Scaling Approach. We hypothesize that this approach enables embedded feature extraction and, integrated with CNN-based deep learning, can boost the predictive accuracy. We illustrate the superior predictive capabilities of the proposed framework as compared to state-of-the-art methodologies in drug sensitivity prediction scenarios using synthetic datasets, drug chemical descriptors as predictors from NCI60, and both transcriptomic information and drug descriptors as predictors from GDSC.


Assuntos
Antineoplásicos/farmacologia , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/tratamento farmacológico , Antineoplásicos/uso terapêutico , Teorema de Bayes , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Conjuntos de Dados como Assunto , Resistencia a Medicamentos Antineoplásicos , Ensaios de Seleção de Medicamentos Antitumorais/métodos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos
9.
ACS Sens ; 5(6): 1674-1682, 2020 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-32436387

RESUMO

Detection of nitric oxide (NO) in biological systems is challenging due to both physicochemical properties of NO and limitations of current imaging modalities and probes. Magnetic resonance imaging (MRI) could be applied for studying NO in living tissue with high spatiotemporal resolution, but there is still a need for chemical agents that effectively sensitize MRI to biological NO production. To develop a suitable probe, we studied the interactions between NO and a library of manganese complexes with various oxidation states and molecular structures. Among this set, the manganese(III) complex with N,N'-(1,2-phenylene)bis(5-fluoro-2-hydroxybenzamide) showed favorable changes in longitudinal relaxivity upon addition of NO-releasing chemicals in vitro while also maintaining selectivity against other biologically relevant reactive nitrogen and oxygen species, making it a suitable NO-responsive contrast agent for T1-weighted MRI. When loaded with this compound, cells ectopically expressing nitric oxide synthase (NOS) isoforms showed MRI signal decreases of over 20% compared to control cells and were also responsive to NOS inhibition or calcium-dependent activation. The sensor could also detect endogenous NOS activity in antigen-stimulated macrophages and in a rat model of neuroinflammation in vivo. Given the key role of NO and associated reactive nitrogen species in numerous physiological and pathological processes, MRI approaches based on the new probe could be broadly beneficial for studies of NO-related signaling in living subjects.


Assuntos
Óxido Nítrico Sintase , Óxido Nítrico , Animais , Meios de Contraste , Imageamento por Ressonância Magnética , Oxigênio , Ratos
10.
Phys Rev E ; 102(6-1): 062425, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33466110

RESUMO

In recent decades computer-aided technologies have become prevalent in medicine, however, cancer drugs are often only tested on in vitro cell lines from biopsies. We derive a full three-dimensional model of inhomogeneous -anisotropic diffusion in a tumor region coupled to a binary population model, which simulates in vivo scenarios faster than traditional cell-line tests. The diffusion tensors are acquired using diffusion tensor magnetic resonance imaging from a patient diagnosed with glioblastoma multiform. Then we numerically simulate the full model with finite element methods and produce drug concentration heat maps, apoptosis hotspots, and dose-response curves. Finally, predictions are made about optimal injection locations and volumes, which are presented in a form that can be employed by doctors and oncologists.


Assuntos
Neoplasias/patologia , Anisotropia , Imagem de Tensor de Difusão , Humanos , Neoplasias/diagnóstico por imagem
11.
Brief Bioinform ; 20(5): 1734-1753, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-31846027

RESUMO

Recent years have seen an increase in the availability of pharmacogenomic databases such as Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) that provide genomic and functional characterization information for multiple cell lines. Studies have alluded to the fact that specific characterizations may be inconsistent between different databases. Analysis of the potential discrepancies in the different databases is highly significant, as these sources are frequently used to analyze and validate methodologies for personalized cancer therapies. In this article, we review the recent developments in investigating the correspondence between different pharmacogenomics databases and discuss the potential factors that require attention when incorporating these sources in any modeling analysis. Furthermore, we explored the consistency among these databases using copulas that can capture nonlinear dependencies between two sets of data.


Assuntos
Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/genética , Farmacogenética , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , Neoplasias/patologia
12.
BMC Bioinformatics ; 20(Suppl 12): 317, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31216980

RESUMO

BACKGROUND: Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage. RESULTS: In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time responses with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals. CONCLUSION: We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharmacological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds.


Assuntos
Modelos Teóricos , Farmacologia , Simulação por Computador , Bases de Dados como Assunto , Relação Dose-Resposta a Droga , Humanos , Fatores de Tempo
13.
BMC Cancer ; 19(1): 593, 2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31208434

RESUMO

BACKGROUND: Cancer patients with advanced disease routinely exhaust available clinical regimens and lack actionable genomic medicine results, leaving a large patient population without effective treatments options when their disease inevitably progresses. To address the unmet clinical need for evidence-based therapy assignment when standard clinical approaches have failed, we have developed a probabilistic computational modeling approach which integrates molecular sequencing data with functional assay data to develop patient-specific combination cancer treatments. METHODS: Tissue taken from a murine model of alveolar rhabdomyosarcoma was used to perform single agent drug screening and DNA/RNA sequencing experiments; results integrated via our computational modeling approach identified a synergistic personalized two-drug combination. Cells derived from the primary murine tumor were allografted into mouse models and used to validate the personalized two-drug combination. Computational modeling of single agent drug screening and RNA sequencing of multiple heterogenous sites from a single patient's epithelioid sarcoma identified a personalized two-drug combination effective across all tumor regions. The heterogeneity-consensus combination was validated in a xenograft model derived from the patient's primary tumor. Cell cultures derived from human and canine undifferentiated pleomorphic sarcoma were assayed by drug screen; computational modeling identified a resistance-abrogating two-drug combination common to both cell cultures. This combination was validated in vitro via a cell regrowth assay. RESULTS: Our computational modeling approach addresses three major challenges in personalized cancer therapy: synergistic drug combination predictions (validated in vitro and in vivo in a genetically engineered murine cancer model), identification of unifying therapeutic targets to overcome intra-tumor heterogeneity (validated in vivo in a human cancer xenograft), and mitigation of cancer cell resistance and rewiring mechanisms (validated in vitro in a human and canine cancer model). CONCLUSIONS: These proof-of-concept studies support the use of an integrative functional approach to personalized combination therapy prediction for the population of high-risk cancer patients lacking viable clinical options and without actionable DNA sequencing-based therapy.


Assuntos
Biologia Computacional/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Quimioterapia Combinada/métodos , Modelos Estatísticos , Medicina de Precisão/métodos , Rabdomiossarcoma Alveolar/tratamento farmacológico , Animais , Linhagem Celular Tumoral , Modelos Animais de Doenças , Cães , Sinergismo Farmacológico , Feminino , Xenoenxertos , Humanos , Estimativa de Kaplan-Meier , Camundongos , Camundongos Endogâmicos NOD
14.
Sci Rep ; 9(1): 1628, 2019 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-30733524

RESUMO

Drug sensitivity prediction for individual tumors is a significant challenge in personalized medicine. Current modeling approaches consider prediction of a single metric of the drug response curve such as AUC or IC50. However, the single summary metric of a dose-response curve fails to provide the entire drug sensitivity profile which can be used to design the optimal dose for a patient. In this article, we assess the problem of predicting the complete dose-response curve based on genetic characterizations. We propose an enhancement to the popular ensemble-based Random Forests approach that can directly predict the entire functional profile of a dose-response curve rather than a single summary metric. We design functional regression trees with node costs modified based on dose/response region dependence methodologies and response distribution based approaches. Our results relative to large pharmacological databases such as CCLE and GDSC show a higher accuracy in predicting dose-response curves of the proposed functional framework in contrast to univariate or multivariate Random Forest predicting sensitivities at different dose levels. Furthermore, we also considered the problem of predicting functional responses from functional predictors i.e., estimating the dose-response curves with a model built on dose-dependent expression data. The superior performance of Functional Random Forest using functional data as compared to existing approaches have been shown using the HMS-LINCS dataset. In summary, Functional Random Forest presents an enhanced predictive modeling framework to predict the entire functional response profile considering both static and functional predictors instead of predicting the summary metrics of the response curves.


Assuntos
Relação Dose-Resposta a Droga , Modelos Teóricos , Área Sob a Curva , Linhagem Celular , Bases de Dados de Produtos Farmacêuticos , Humanos , Análise Multivariada , Neoplasias/tratamento farmacológico , Neoplasias/genética , Análise de Regressão , Reprodutibilidade dos Testes
15.
Bioinformatics ; 35(17): 3143-3145, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30649230

RESUMO

SUMMARY: Biological processes are characterized by a variety of different genomic feature sets. However, often times when building models, portions of these features are missing for a subset of the dataset. We provide a modeling framework to effectively integrate this type of heterogeneous data to improve prediction accuracy. To test our methodology, we have stacked data from the Cancer Cell Line Encyclopedia to increase the accuracy of drug sensitivity prediction. The package addresses the dynamic regime of information integration involving sequential addition of features and samples. AVAILABILITY AND IMPLEMENTATION: The framework has been implemented as a R package Sstack, which can be downloaded from https://cran.r-project.org/web/packages/Sstack/index.html, where further explanation of the package is available. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Software , Linhagem Celular Tumoral , Genoma , Genômica , Humanos
16.
BMC Bioinformatics ; 19(Suppl 17): 497, 2018 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-30591023

RESUMO

BACKGROUND: In precision medicine, scarcity of suitable biological data often hinders the design of an appropriate predictive model. In this regard, large scale pharmacogenomics studies, like CCLE and GDSC hold the promise to mitigate the issue. However, one cannot directly employ data from multiple sources together due to the existing distribution shift in data. One way to solve this problem is to utilize the transfer learning methodologies tailored to fit in this specific context. RESULTS: In this paper, we present two novel approaches for incorporating information from a secondary database for improving the prediction in a target database. The first approach is based on latent variable cost optimization and the second approach considers polynomial mapping between the two databases. Utilizing CCLE and GDSC databases, we illustrate that the proposed approaches accomplish a better prediction of drug sensitivities for different scenarios as compared to the existing approaches. CONCLUSION: We have compared the performance of the proposed predictive models with database-specific individual models as well as existing transfer learning approaches. We note that our proposed approaches exhibit superior performance compared to the abovementioned alternative techniques for predicting sensitivity for different anti-cancer compounds, particularly the nonlinear mapping model shows the best overall performance.


Assuntos
Algoritmos , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Área Sob a Curva , Bases de Dados Factuais , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias/genética
17.
Annu Int Conf IEEE Eng Med Biol Soc ; 2018: 279-282, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30440392

RESUMO

Integrating multiple databases of similar tasks is a significant problem in biological data analysis. In this paper, we consider whether feature selection in a single database can benefit from incorporating similar databases. We report that by using adaptive multi-task elastic net for feature selection and Random Forest for prediction, the prediction performance can be improved for pharmacogenomics databases. We also present a simulation study to explain the robust feature selection benefit of adaptive multi task elastic net while dealing with noisy features.


Assuntos
Algoritmos , Farmacogenética , Bases de Dados Factuais
18.
Curr Opin Neurobiol ; 50: 201-210, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29649765

RESUMO

One of the greatest challenges of modern neuroscience is to incorporate our growing knowledge of molecular and cellular-scale physiology into integrated, organismic-scale models of brain function in behavior and cognition. Molecular-level functional magnetic resonance imaging (molecular fMRI) is a new technology that can help bridge these scales by mapping defined microscopic phenomena over large, optically inaccessible regions of the living brain. In this review, we explain how MRI-detectable imaging probes can be used to sensitize noninvasive imaging to mechanistically significant components of neural processing. We discuss how a combination of innovative probe design, advanced imaging methods, and strategies for brain delivery can make molecular fMRI an increasingly successful approach for spatiotemporally resolved studies of diverse neural phenomena, perhaps eventually in people.


Assuntos
Mapeamento Encefálico , Encéfalo/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Imagem Molecular/métodos , Animais , Humanos
19.
BMC Bioinformatics ; 19(Suppl 3): 71, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29589559

RESUMO

BACKGROUND: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types. RESULTS: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines. CONCLUSION: The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.


Assuntos
Ensaios de Seleção de Medicamentos Antitumorais , Modelos Biológicos , Algoritmos , Área Sob a Curva , Viés , Linhagem Celular Tumoral , Aprendizado Profundo , Humanos , Neoplasias/tratamento farmacológico , Medicina de Precisão
20.
Bioinformatics ; 34(8): 1336-1344, 2018 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-29267851

RESUMO

Motivation: Random forest (RF) has become a widely popular prediction generating mechanism. Its strength lies in its flexibility, interpretability and ability to handle large number of features, typically larger than the sample size. However, this methodology is of limited use if one wishes to identify statistically significant features. Several ranking schemes are available that provide information on the relative importance of the features, but there is a paucity of general inferential mechanism, particularly in a multi-variate set up. We use the conditional inference tree framework to generate a RF where features are deleted sequentially based on explicit hypothesis testing. The resulting sequential algorithm offers an inferentially justifiable, but model-free, variable selection procedure. Significant features are then used to generate predictive RF. An added advantage of our methodology is that both variable selection and prediction are based on conditional inference framework and hence are coherent. Results: We illustrate the performance of our Sequential Multi-Response Feature Selection approach through simulation studies and finally apply this methodology on Genomics of Drug Sensitivity for Cancer dataset to identify genetic characteristics that significantly impact drug sensitivities. Significant set of predictors obtained from our method are further validated from biological perspective. Availability and implementation: https://github.com/jomayer/SMuRF. Contact: souparno.ghosh@ttu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genômica/métodos , Antineoplásicos/uso terapêutico , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA