Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 24(1): 258, 2023 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-37330468

RESUMEN

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.


Asunto(s)
Modelos Estadísticos , Bosques Aleatorios , Niño , Humanos , Simulación por Computador
2.
Stat Methods Med Res ; 31(11): 2217-2236, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-35895510

RESUMEN

Survival data with time-varying covariates are common in practice. If relevant, they can improve on the estimation of a survival function. However, the traditional survival forests-conditional inference forest, relative risk forest and random survival forest-have accommodated only time-invariant covariates. We generalize the conditional inference and relative risk forests to allow time-varying covariates. We also propose a general framework for estimation of a survival function in the presence of time-varying covariates. We compare their performance with that of the Cox model and transformation forest, adapted here to accommodate time-varying covariates, through a comprehensive simulation study in which the Kaplan-Meier estimate serves as a benchmark, and performance is compared using the integrated L2 difference between the true and estimated survival functions. In general, the performance of the two proposed forests substantially improves over the Kaplan-Meier estimate. Taking into account all other factors, under the proportional hazard setting, the best method is always one of the two proposed forests, while under the non-proportional hazard setting, it is the adapted transformation forest. K-fold cross-validation is used as an effective tool to choose between the methods in practice.


Asunto(s)
Proyectos de Investigación , Análisis de Supervivencia , Modelos de Riesgos Proporcionales , Estimación de Kaplan-Meier , Simulación por Computador
3.
Bioinformatics ; 37(17): 2714-2721, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33693547

RESUMEN

MOTIVATION: Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. RESULTS: We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. AVAILABILITY AND IMPLEMENTATION: RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Entropy (Basel) ; 22(12)2020 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-33266340

RESUMEN

We present an unsupervised method to detect anomalous time series among a collection of time series. To do so, we extend traditional Kernel Density Estimation for estimating probability distributions in Euclidean space to Hilbert spaces. The estimated probability densities we derive can be obtained formally through treating each series as a point in a Hilbert space, placing a kernel at those points, and summing the kernels (a "point approach"), or through using Kernel Density Estimation to approximate the distributions of Fourier mode coefficients to infer a probability density (a "Fourier approach"). We refer to these approaches as Functional Kernel Density Estimation for Anomaly Detection as they both yield functionals that can score a time series for how anomalous it is. Both methods naturally handle missing data and apply to a variety of settings, performing well when compared with an outlyingness score derived from a boxplot method for functional data, with a Principal Component Analysis approach for functional data, and with the Functional Isolation Forest method. We illustrate the use of the proposed methods with aviation safety report data from the International Air Transport Association (IATA).

5.
Bioinformatics ; 36(2): 629-636, 2020 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-31373350

RESUMEN

MOTIVATION: Personalized medicine often relies on accurate estimation of a treatment effect for specific subjects. This estimation can be based on the subject's baseline covariates but additional complications arise for a time-to-event response subject to censoring. In this paper, the treatment effect is measured as the difference between the mean survival time of a treated subject and the mean survival time of a control subject. We propose a new random forest method for estimating the individual treatment effect with survival data. The random forest is formed by individual trees built with a splitting rule specifically designed to partition the data according to the individual treatment effect. For a new subject, the forest provides a set of similar subjects from the training dataset that can be used to compute an estimation of the individual treatment effect with any adequate method. RESULTS: The merits of the proposed method are investigated with a simulation study where it is compared to numerous competitors, including recent state-of-the-art methods. The results indicate that the proposed method has a very good and stable performance to estimate the individual treatment effects. Two examples of application with a colon cancer data and breast cancer data show that the proposed method can detect a treatment effect in a sub-population even when the overall effect is small or nonexistent. AVAILABILITY AND IMPLEMENTATION: The authors are working on an R package implementing the proposed method and it will be available soon. In the meantime, the code can be obtained from the first author at sami.tabib@hec.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Proyectos de Investigación , Tasa de Supervivencia
6.
Stat Methods Med Res ; 29(1): 205-229, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-30786820

RESUMEN

The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the classification and regression tree (CART) paradigm and the transformation forest method. For CART forests, in addition to the default least-squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated, and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data.


Asunto(s)
Modelos Estadísticos , Algoritmos , Calibración , Simulación por Computador , Predicción , Humanos , Aprendizaje Automático , Proyectos de Investigación
7.
Stat Methods Med Res ; 29(8): 2217-2237, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-31762374

RESUMEN

We propose a general hurdle methodology to model a response from a homogeneous or a non-homogeneous Poisson process with excess zeros, based on two forests. The first forest in the two parts model is used to estimate the probability of having a zero. The second forest is used to estimate the Poisson parameter(s), using only the observations with at least one event. To build the trees in the second forest, we propose specialized splitting criteria derived from the zero truncated homogeneous and non-homogeneous Poisson likelihood. The particular case of a homogeneous process is investigated in details to stress out the advantages of the proposed method over the existing ones. Simulation studies show that the proposed methods perform well in hurdle (zero-altered) and zero-inflated settings, for both homogeneous and non-homogeneous processes. We illustrate the use of the new method with real data on the demand for medical care by the elderly.


Asunto(s)
Modelos Estadísticos , Proyectos de Investigación , Anciano , Simulación por Computador , Humanos , Distribución de Poisson
8.
Stat Methods Med Res ; 28(2): 445-461, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-28835170

RESUMEN

Tree-based methods are very powerful and popular tools for analysing survival data with right-censoring. The existing methods assume that the true time-to-event and the censoring times are independent given the covariates. We propose different ways to build survival forests when dependent censoring is suspected, by using an appropriate estimator of the survival function when aggregating the individual trees and/or by modifying the splitting rule. The appropriate estimator used in this paper is the copula-graphic estimator. We also propose a new method for building survival forests, called p-forest, that may be used not only when dependent censoring is suspected, but also as a new survival forest method in general. The results from a simulation study indicate that these modifications improve greatly the estimation of the survival function in situations of dependent censoring. A real data example illustrates how the proposed methods can be used to perform a sensitivity analysis.


Asunto(s)
Análisis de Supervivencia , Algoritmos , Análisis de Varianza , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Cirrosis Hepática/mortalidad , Cirrosis Hepática/cirugía , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Proyectos de Investigación
9.
Lifetime Data Anal ; 23(4): 671-691, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-27379423

RESUMEN

The log-rank test is used as the split function in many commonly used survival trees and forests algorithms. However, the log-rank test may have a significant loss of power in some circumstances, especially when the hazard functions or when the survival functions cross each other in the two compared groups. We investigate the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule. Simulations studies and applications to real data sets show that forests built with this rule produce very good results in general, and that they are often better compared to forests built with the log-rank splitting rule.


Asunto(s)
Modelos Estadísticos , Análisis de Supervivencia , Algoritmos , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Estimación de Kaplan-Meier , Tablas de Vida , Modelos de Riesgos Proporcionales
10.
Stat Methods Med Res ; 26(4): 1867-1880, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26152747

RESUMEN

Outlier detection covers the wide range of methods aiming at identifying observations that are considered unusual. Novelty detection, on the other hand, seeks observations among newly generated test data that are exceptional compared with previously observed training data. In many applications, the general existence of novelty is of more interest than identifying the individual novel observations. For instance, in high-throughput cancer treatment screening experiments, it is meaningful to test whether any new treatment effects are seen compared with existing compounds. Here, we present hypothesis tests for such global level novelty. The problem is approached through a set of very general assumptions, making it innovative in relation to the current literature. We introduce test statistics capable of detecting novelty. They operate on local neighborhoods and their null distribution is obtained by the permutation principle. We show that they are valid and able to find different types of novelty, e.g. location and scale alternatives. The performance of the methods is assessed with simulations and with applications to real data sets.


Asunto(s)
Estadísticas no Paramétricas , Línea Celular Tumoral , Conjuntos de Datos como Asunto , Ensayos de Selección de Medicamentos Antitumorales , Flores/anatomía & histología , Humanos , Masculino , Distribución Normal , Neoplasias de la Próstata/tratamiento farmacológico , Neoplasias de la Próstata/patología , Reproducibilidad de los Resultados
11.
Q J Exp Psychol (Hove) ; 65(10): 1872-9, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22950838

RESUMEN

The processing of syllables during the writing of isolated words has been shown to occur either before or during the writing of the word containing them. To demonstrate that this difference is related to graphomotor constraints, participants copied bi- and trisyllabic words three times, in four conditions where graphomotor constraints were gradually increased. As expected, latencies were only affected by syllable number in the low-constraint condition. In all four conditions, interletter intervals at syllable boundaries were longer than intrasyllabic interletter intervals. The difference between inter- and intrasyllabic intervals increased with the level of graphomotor constraint. Taken together, these findings indicate that under low graphomotor-constraint conditions, all the syllable processing takes place prior to the writing of a word, whereas under higher graphomotor-constraint conditions, syllable processing is more sequential, each syllable being processed just before it is written.


Asunto(s)
Escritura Manual , Fonética , Aprendizaje Verbal/fisiología , Adulto , Femenino , Humanos , Masculino , Psicolingüística , Tiempo de Reacción , Factores de Tiempo , Vocabulario , Adulto Joven
12.
Can J Exp Psychol ; 65(3): 141-50, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21639610

RESUMEN

This study investigated the time course of spelling, and its influence on graphomotor execution, in a successive word copy task. According to the cascade model, these two processes may be engaged either sequentially or in parallel, depending on the cognitive demands of spelling. In this experiment, adults were asked to copy a series of words varying in frequency and spelling regularity. A combined analysis of eye and pen movements revealed periods where spelling occurred in parallel with graphomotor execution, but concerned different processing units. The extent of this parallel processing depended on the words' orthographic characteristics. Results also highlighted the specificity of word recognition for copying purposes compared with recognition for reading tasks. The results confirm the validity of the cascade model and clarify the nature of the dependence between spelling and graphomotor processes.


Asunto(s)
Psicolingüística , Tiempo de Reacción , Escritura , Cognición , Medidas del Movimiento Ocular/psicología , Femenino , Fijación Ocular/fisiología , Francia , Humanos , Masculino , Lectura , Percepción Visual , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...