Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Biometrics ; 72(2): 372-81, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26692376

RESUMEN

Large assembled cohorts with banked biospecimens offer valuable opportunities to identify novel markers for risk prediction. When the outcome of interest is rare, an effective strategy to conserve limited biological resources while maintaining reasonable statistical power is the case cohort (CCH) sampling design, in which expensive markers are measured on a subset of cases and controls. However, the CCH design introduces significant analytical complexity due to outcome-dependent, finite-population sampling. Current methods for analyzing CCH studies focus primarily on the estimation of simple survival models with linear effects; testing and estimation procedures that can efficiently capture complex non-linear marker effects for CCH data remain elusive. In this article, we propose inverse probability weighted (IPW) variance component type tests for identifying important marker sets through a Cox proportional hazards kernel machine (CoxKM) regression framework previously considered for full cohort studies (Cai et al., 2011). The optimal choice of kernel, while vitally important to attain high power, is typically unknown for a given dataset. Thus, we also develop robust testing procedures that adaptively combine information from multiple kernels. The proposed IPW test statistics have complex null distributions that cannot easily be approximated explicitly. Furthermore, due to the correlation induced by CCH sampling, standard resampling methods such as the bootstrap fail to approximate the distribution correctly. We, therefore, propose a novel perturbation resampling scheme that can effectively recover the induced correlation structure. Results from extensive simulation studies suggest that the proposed IPW CoxKM testing procedures work well in finite samples. The proposed methods are further illustrated by application to a Danish CCH study of Apolipoprotein C-III markers on the risk of coronary heart disease.


Asunto(s)
Biomarcadores , Estudios de Cohortes , Modelos de Riesgos Proporcionales , Medición de Riesgo/métodos , Apolipoproteínas C/análisis , Biometría/métodos , Simulación por Computador , Enfermedad Coronaria/diagnóstico , Humanos , Medición de Riesgo/estadística & datos numéricos , Muestreo , Análisis de Supervivencia
2.
J Biomed Inform ; 52: 386-93, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25117751

RESUMEN

In this paper we describe an efficient tool based on natural language processing for classifying the detail state of pulmonary embolism (PE) recorded in CT pulmonary angiography reports. The classification tasks include: PE present vs. absent, acute PE vs. others, central PE vs. others, and subsegmental PE vs. others. Statistical learning algorithms were trained with features extracted using the NLP tool and gold standard labels obtained via chart review from two radiologists. The areas under the receiver operating characteristic curves (AUC) for the four tasks were 0.998, 0.945, 0.987, and 0.986, respectively. We compared our classifiers with bag-of-words Naive Bayes classifiers, a standard text mining technology, which gave AUC 0.942, 0.765, 0.766, and 0.712, respectively.


Asunto(s)
Angiografía/métodos , Procesamiento de Lenguaje Natural , Embolia Pulmonar/diagnóstico por imagen , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , Algoritmos , Humanos , Curva ROC
3.
Artículo en Inglés | MEDLINE | ID: mdl-37974910

RESUMEN

Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, p, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label Y and the feature set X are observed) and a much larger, weakly-labeled dataset in which the feature set X is accompanied only by a surrogate label S that is available to all patients. Under a working prior assumption that S is related to X only through Y and allowing it to hold approximately, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.

4.
Stat Methods Med Res ; 27(4): 1099-1114, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-27255336

RESUMEN

In cancer studies, patients often experience two different types of events: a non-terminal event such as recurrence or metastasis, and a terminal event such as cancer-specific death. Identifying pathways and networks of genes associated with one or both of these events is an important step in understanding disease development and targeting new biological processes for potential intervention. These correlated outcomes are commonly dealt with by modeling progression-free survival, where the event time is the minimum between the times of recurrence and death. However, identifying pathways only associated with progression-free survival may miss out on pathways that affect time to recurrence but not death, or vice versa. We propose a combined testing procedure for a pathway's association with both the cause-specific hazard of recurrence and the marginal hazard of death. The dependency between the two outcomes is accounted for through perturbation resampling to approximate the test's null distribution, without any further assumption on the nature of the dependency. Even complex non-linear relationships between pathways and disease progression or death can be uncovered thanks to a flexible kernel machine framework. The superior statistical power of our approach is demonstrated in numerical studies and in a gene expression study of breast cancer.


Asunto(s)
Neoplasias de la Mama/genética , Medición de Riesgo , Algoritmos , Interpretación Estadística de Datos , Femenino , Expresión Génica , Humanos , Medición de Riesgo/estadística & datos numéricos
5.
J Mach Learn Res ; 17(1): 2976-3012, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-28503101

RESUMEN

It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when X ~ 𝒩(0, 𝕀 p×p ) and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design X comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with L1 penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on f and ε compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of ß0 if X ~ 𝒩 (0, Σ), and the covariance Σ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA