Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Stat Methods Med Res ; 29(2): 455-465, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30943854

RESUMO

Electronic medical records data are valuable resources for discovery research. They contain detailed phenotypic information on individual patients, opening opportunities for simultaneously studying multiple phenotypes. A useful tool for such simultaneous assessment is the phenome-wide association study, which relates a genomic or biological marker of interest to a wide spectrum of disease phenotypes, typically defined by the diagnostic billing codes. One challenge arises when the biomarker of interest is expensive to measure on the entire electronic medical record cohort. Performing phenome-wide association study based on supervised estimation using only subjects who have marker measurements may yield limited power. In this paper, we focus on the setting where the marker is measured on a small fraction of the patients while a few surrogate markers such as historical measurements of the biomarker are available on a large number of patients. We propose an efficient semi-supervised estimation procedure to estimate the covariance between the biomarker and the billing code, leveraging the surrogate marker information. We employ surrogate marker values to impute the missing outcome via a two-step semi-non-parametric approach and demonstrate that our proposed estimator is always more efficient than the supervised counterpart without requiring the imputation model to be correct. We illustrate the proposed procedure by assessing the association between the C-reactive protein and some inflammatory diseases with an electronic medical record study of inflammatory bowel disease performed with the Partners HealthCare electronic medical record database where C-reactive protein was only measured for a small fraction of the patients due to budget constraints.


Assuntos
Interpretação Estatística de Dados , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Algoritmos , Viés , Biomarcadores , Doenças Inflamatórias Intestinais
2.
Biometrics ; 76(3): 767-777, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31797368

RESUMO

We consider estimating average treatment effects (ATE) of a binary treatment in observational data when data-driven variable selection is needed to select relevant covariates from a moderately large number of available covariates X . To leverage covariates among X predictive of the outcome for efficiency gain while using regularization to fit a parametric propensity score (PS) model, we consider a dimension reduction of X based on fitting both working PS and outcome models using adaptive LASSO. A novel PS estimator, the Double-index Propensity Score (DiPS), is proposed, in which the treatment status is smoothed over the linear predictors for X from both the initial working models. The ATE is estimated by using the DiPS in a normalized inverse probability weighting estimator, which is found to maintain double robustness and also local semiparametric efficiency with a fixed number of covariates p. Under misspecification of working models, the smoothing step leads to gains in efficiency and robustness over traditional doubly robust estimators. These results are extended to the case where p diverges with sample size and working models are sparse. Simulations show the benefits of the approach in finite samples. We illustrate the method by estimating the ATE of statins on colorectal cancer risk in an electronic medical record study and the effect of smoking on C-reactive protein in the Framingham Offspring Study.


Assuntos
Modelos Estatísticos , Fumar , Simulação por Computador , Humanos , Pontuação de Propensão , Tamanho da Amostra
3.
J Am Med Inform Assoc ; 24(e1): e143-e149, 2017 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-27632993

RESUMO

OBJECTIVE: Phenotyping algorithms are capable of accurately identifying patients with specific phenotypes from within electronic medical records systems. However, developing phenotyping algorithms in a scalable way remains a challenge due to the extensive human resources required. This paper introduces a high-throughput unsupervised feature selection method, which improves the robustness and scalability of electronic medical record phenotyping without compromising its accuracy. METHODS: The proposed Surrogate-Assisted Feature Extraction (SAFE) method selects candidate features from a pool of comprehensive medical concepts found in publicly available knowledge sources. The target phenotype's International Classification of Diseases, Ninth Revision and natural language processing counts, acting as noisy surrogates to the gold-standard labels, are used to create silver-standard labels. Candidate features highly predictive of the silver-standard labels are selected as the final features. RESULTS: Algorithms were trained to identify patients with coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis using various numbers of labels to compare the performance of features selected by SAFE, a previously published automated feature extraction for phenotyping procedure, and domain experts. The out-of-sample area under the receiver operating characteristic curve and F -score from SAFE algorithms were remarkably higher than those from the other two, especially at small label sizes. CONCLUSION: SAFE advances high-throughput phenotyping methods by automatically selecting a succinct set of informative features for algorithm training, which in turn reduces overfitting and the needed number of gold-standard labels. SAFE also potentially identifies important features missed by automated feature extraction for phenotyping or experts.


Assuntos
Algoritmos , Mineração de Dados , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA