Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
BMC Bioinformatics ; 24(1): 258, 2023 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-37330468

RESUMO

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.


Assuntos
Modelos Estatísticos , Algoritmo Florestas Aleatórias , Criança , Humanos , Simulação por Computador
2.
Bioinformatics ; 37(17): 2714-2721, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33693547

RESUMO

MOTIVATION: Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. RESULTS: We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. AVAILABILITY AND IMPLEMENTATION: RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Bioinformatics ; 36(2): 629-636, 2020 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-31373350

RESUMO

MOTIVATION: Personalized medicine often relies on accurate estimation of a treatment effect for specific subjects. This estimation can be based on the subject's baseline covariates but additional complications arise for a time-to-event response subject to censoring. In this paper, the treatment effect is measured as the difference between the mean survival time of a treated subject and the mean survival time of a control subject. We propose a new random forest method for estimating the individual treatment effect with survival data. The random forest is formed by individual trees built with a splitting rule specifically designed to partition the data according to the individual treatment effect. For a new subject, the forest provides a set of similar subjects from the training dataset that can be used to compute an estimation of the individual treatment effect with any adequate method. RESULTS: The merits of the proposed method are investigated with a simulation study where it is compared to numerous competitors, including recent state-of-the-art methods. The results indicate that the proposed method has a very good and stable performance to estimate the individual treatment effects. Two examples of application with a colon cancer data and breast cancer data show that the proposed method can detect a treatment effect in a sub-population even when the overall effect is small or nonexistent. AVAILABILITY AND IMPLEMENTATION: The authors are working on an R package implementing the proposed method and it will be available soon. In the meantime, the code can be obtained from the first author at sami.tabib@hec.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Projetos de Pesquisa , Taxa de Sobrevida
4.
Entropy (Basel) ; 22(12)2020 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-33266340

RESUMO

We present an unsupervised method to detect anomalous time series among a collection of time series. To do so, we extend traditional Kernel Density Estimation for estimating probability distributions in Euclidean space to Hilbert spaces. The estimated probability densities we derive can be obtained formally through treating each series as a point in a Hilbert space, placing a kernel at those points, and summing the kernels (a "point approach"), or through using Kernel Density Estimation to approximate the distributions of Fourier mode coefficients to infer a probability density (a "Fourier approach"). We refer to these approaches as Functional Kernel Density Estimation for Anomaly Detection as they both yield functionals that can score a time series for how anomalous it is. Both methods naturally handle missing data and apply to a variety of settings, performing well when compared with an outlyingness score derived from a boxplot method for functional data, with a Principal Component Analysis approach for functional data, and with the Functional Isolation Forest method. We illustrate the use of the proposed methods with aviation safety report data from the International Air Transport Association (IATA).

5.
Lifetime Data Anal ; 23(4): 671-691, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-27379423

RESUMO

The log-rank test is used as the split function in many commonly used survival trees and forests algorithms. However, the log-rank test may have a significant loss of power in some circumstances, especially when the hazard functions or when the survival functions cross each other in the two compared groups. We investigate the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule. Simulations studies and applications to real data sets show that forests built with this rule produce very good results in general, and that they are often better compared to forests built with the log-rank splitting rule.


Assuntos
Modelos Estatísticos , Análise de Sobrevida , Algoritmos , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Tábuas de Vida , Modelos de Riscos Proporcionais
6.
Stat Methods Med Res ; 31(11): 2217-2236, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35895510

RESUMO

Survival data with time-varying covariates are common in practice. If relevant, they can improve on the estimation of a survival function. However, the traditional survival forests-conditional inference forest, relative risk forest and random survival forest-have accommodated only time-invariant covariates. We generalize the conditional inference and relative risk forests to allow time-varying covariates. We also propose a general framework for estimation of a survival function in the presence of time-varying covariates. We compare their performance with that of the Cox model and transformation forest, adapted here to accommodate time-varying covariates, through a comprehensive simulation study in which the Kaplan-Meier estimate serves as a benchmark, and performance is compared using the integrated L2 difference between the true and estimated survival functions. In general, the performance of the two proposed forests substantially improves over the Kaplan-Meier estimate. Taking into account all other factors, under the proportional hazard setting, the best method is always one of the two proposed forests, while under the non-proportional hazard setting, it is the adapted transformation forest. K-fold cross-validation is used as an effective tool to choose between the methods in practice.


Assuntos
Projetos de Pesquisa , Análise de Sobrevida , Modelos de Riscos Proporcionais , Estimativa de Kaplan-Meier , Simulação por Computador
7.
Stat Methods Med Res ; 29(1): 205-229, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30786820

RESUMO

The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the classification and regression tree (CART) paradigm and the transformation forest method. For CART forests, in addition to the default least-squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated, and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data.


Assuntos
Modelos Estatísticos , Algoritmos , Calibragem , Simulação por Computador , Previsões , Humanos , Aprendizado de Máquina , Projetos de Pesquisa
8.
Stat Methods Med Res ; 29(8): 2217-2237, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31762374

RESUMO

We propose a general hurdle methodology to model a response from a homogeneous or a non-homogeneous Poisson process with excess zeros, based on two forests. The first forest in the two parts model is used to estimate the probability of having a zero. The second forest is used to estimate the Poisson parameter(s), using only the observations with at least one event. To build the trees in the second forest, we propose specialized splitting criteria derived from the zero truncated homogeneous and non-homogeneous Poisson likelihood. The particular case of a homogeneous process is investigated in details to stress out the advantages of the proposed method over the existing ones. Simulation studies show that the proposed methods perform well in hurdle (zero-altered) and zero-inflated settings, for both homogeneous and non-homogeneous processes. We illustrate the use of the new method with real data on the demand for medical care by the elderly.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Idoso , Simulação por Computador , Humanos , Distribuição de Poisson
9.
Stat Methods Med Res ; 28(2): 445-461, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-28835170

RESUMO

Tree-based methods are very powerful and popular tools for analysing survival data with right-censoring. The existing methods assume that the true time-to-event and the censoring times are independent given the covariates. We propose different ways to build survival forests when dependent censoring is suspected, by using an appropriate estimator of the survival function when aggregating the individual trees and/or by modifying the splitting rule. The appropriate estimator used in this paper is the copula-graphic estimator. We also propose a new method for building survival forests, called p-forest, that may be used not only when dependent censoring is suspected, but also as a new survival forest method in general. The results from a simulation study indicate that these modifications improve greatly the estimation of the survival function in situations of dependent censoring. A real data example illustrates how the proposed methods can be used to perform a sensitivity analysis.


Assuntos
Análise de Sobrevida , Algoritmos , Análise de Variância , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Cirrose Hepática/mortalidade , Cirrose Hepática/cirurgia , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa
10.
Stat Methods Med Res ; 26(4): 1867-1880, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26152747

RESUMO

Outlier detection covers the wide range of methods aiming at identifying observations that are considered unusual. Novelty detection, on the other hand, seeks observations among newly generated test data that are exceptional compared with previously observed training data. In many applications, the general existence of novelty is of more interest than identifying the individual novel observations. For instance, in high-throughput cancer treatment screening experiments, it is meaningful to test whether any new treatment effects are seen compared with existing compounds. Here, we present hypothesis tests for such global level novelty. The problem is approached through a set of very general assumptions, making it innovative in relation to the current literature. We introduce test statistics capable of detecting novelty. They operate on local neighborhoods and their null distribution is obtained by the permutation principle. We show that they are valid and able to find different types of novelty, e.g. location and scale alternatives. The performance of the methods is assessed with simulations and with applications to real data sets.


Assuntos
Estatísticas não Paramétricas , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Ensaios de Seleção de Medicamentos Antitumorais , Flores/anatomia & histologia , Humanos , Masculino , Distribuição Normal , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/patologia , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA