Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
AMIA Jt Summits Transl Sci Proc ; 2024: 221-229, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827091

RESUMO

We recently demonstrated that electronically constructed family pedigrees (e-pedigrees) have great value in epidemiologic research using electronic health record (EHR) data. Prior to this work, it has been well accepted that family health history is a major predictor for a wide spectrum of diseases, reflecting shared effects of genetics, environment, and lifestyle. With the widespread digitalization of patient data via EHRs, there is an unprecedented opportunity to use machine learning algorithms to better predict disease risk. Although predictive models have previously been constructed for a few important diseases, we currently know very little about how accurately the risk for most diseases can be predicted. It is further unknown if the incorporation of e-pedigrees in machine learning can improve the value of these models. In this study, we devised a family pedigree-driven high-throughput machine learning pipeline to simultaneously predict risks for thousands of diagnosis codes using thousands of input features. Models were built to predict future disease risk for three time windows using both Logistic Regression and XGBoost. For example, we achieved average areas under the receiver operating characteristic curves (AUCs) of 0.82, 0.77 and 0.71 for 1, 6, and 24 months, respectively using XGBoost and without e-pedigrees. When adding e-pedigree features to the XGBoost pipeline, AUCs increased to 0.83, 0.79 and 0.74 for the same three time periods, respectively. E-pedigrees similarly improved the predictions when using Logistic Regression. These results emphasize the potential value of incorporating family health history via e-pedigrees into machine learning with no further human time.

2.
Methods Mol Biol ; 2496: 91-109, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35713860

RESUMO

Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that high-quality initial hypotheses are crucial. In this chapter, we describe a high-throughput pipeline to produce a ranked list of high-quality hypothesized biomarkers for diseases. We review an example use of this approach to generate a large number of candidate disease biomarker hypotheses derived from machine learning models, filter and rank them according to their potential novelty using text mining, and corroborate the most promising hypotheses with further statistical modeling. The example use of the pipeline uses a large electronic health record dataset and the PubMed corpus, to find several promising hypothesized laboratory tests with previously undocumented correlations to particular diseases.


Assuntos
Mineração de Dados , Aprendizado de Máquina , Registros Eletrônicos de Saúde , Modelos Estatísticos , Publicações
3.
AMIA Jt Summits Transl Sci Proc ; 2019: 248-257, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31258977

RESUMO

We study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially important in privacy sensitive applications such as learning predictive models for clinical decision support from EHR data from different clinics, where each clinic has a responsibility for its patients' privacy. We propose a new approach for ensemble methods: each entity learns a model, from its own data, and then when a client asks the prediction for a new private instance, the answers from all the locally trained models are used to compute the prediction in such a way that no extra information is revealed. We implement this approach for random forests and we demonstrate its high efficiency and potential accuracy benefit via experiments on real-world datasets, including actual EHR data.

4.
AMIA Jt Summits Transl Sci Proc ; 2019: 572-581, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31259012

RESUMO

Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that higher quality initial hypotheses are crucial. In this work, we propose a high-throughput pipeline to produce a ranked list of high-quality hypothesized marker laboratory tests for diagnoses. Our pipeline generates a large number of candidate lab-diagnosis hypotheses derived from machine learning models, filters and ranks them according to their potential novelty using text mining, and corroborate final hypotheses with logistic regression analysis. We test our approach on a large electronic health record dataset and the PubMed corpus, and find several promising candidate hypotheses.

5.
AMIA Jt Summits Transl Sci Proc ; 2017: 139-146, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29888059

RESUMO

Calciphylaxis is a disorder that results in necrotic cutaneous lesions with a high rate of mortality. Due to its rarity and complexity, the risk factors for and the disease mechanism of calciphylaxis are not fully understood. This work focuses on the use of machine learning to both predict disease risk and model the contributing factors learned from an electronic health record data set. We present the results of four modeling approaches on several subpopulations of patients with chronic kidney disease (CKD). We find that modeling calciphylaxis risk with random forests learned from binary feature data produces strong models, and in the case of predicting calciphylaxis development among stage 4 CKD patients, we achieve an AUC-ROC of 0.8718. This ability to successfully predict calciphylaxis may provide an excellent opportunity for clinical translation of the predictive models presented in this paper.

6.
Proc Int Conf Mach Learn Appl ; 2018: 40-47, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31799516

RESUMO

There is great interest in methods to improve human insight into trained non-linear models. Leading approaches include producing a ranking of the most relevant features, a non-trivial task for non-linear models. We show theoretically and empirically the benefit of a novel version of recursive feature elimination (RFE) as often used with SVMs; the key idea is a simple twist on the kinds of sensitivity testing employed in computational learning theory with membership queries (e.g., [1]). With membership queries, one can check whether changing the value of a feature in an example changes the label. In the real-world, we usually cannot get answers to such queries, so our approach instead makes these queries to a trained (imperfect) non-linear model. Because SVMs are widely used in bioinformatics, our empirical results use a real-world cancer genomics problem; because ground truth is not known for this task, we discuss the potential insights provided. We also evaluate on synthetic data where ground truth is known.

7.
Med Phys ; 41(4): 042303, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24694153

RESUMO

PURPOSE: To develop a new spatio-temporal texture (SpTeT) based method for distinguishing vulnerable versus stable atherosclerotic plaques on DCE-MRI using a rabbit model of atherothrombosis. METHODS: Aortic atherosclerosis was induced in 20 New Zealand White rabbits by cholesterol diet and endothelial denudation. MRI was performed before (pretrigger) and after (posttrigger) inducing plaque disruption with Russell's-viper-venom and histamine. Of the 30 vascular targets (segments) under histology analysis, 16 contained thrombus (vulnerable) and 14 did not (stable). A total of 352 voxel-wise computerized SpTeT features, including 192 Gabor, 36 Kirsch, 12 Sobel, 52 Haralick, and 60 first-order textural features, were extracted on DCE-MRI to capture subtle texture changes in the plaques over the course of contrast uptake. Different combinations of SpTeT feature sets, in which the features were ranked by a minimum-redundancy-maximum-relevance feature selection technique, were evaluated via a random forest classifier. A 500 iterative 2-fold cross validation was performed for discriminating the vulnerable atherosclerotic plaque and stable atherosclerotic plaque on per voxel basis. Four quantitative metrics were utilized to measure the classification results in separating between vulnerable and stable plaques. RESULTS: The quantitative results show that the combination of five classes of SpTeT features can distinguish between vulnerable (disrupted plaques with an overlying thrombus) and stable plaques with the best AUC values of 0.9631 ± 0.0088, accuracy of 89.98% ± 0.57%, sensitivity of 83.71% ± 1.71%, and specificity of 94.55% ± 0.48%. CONCLUSIONS: Vulnerable and stable plaque can be distinguished by SpTeT based features. The SpTeT features, following validation on larger datasets, could be established as effective and reliable imaging biomarkers for noninvasively assessing atherosclerotic risk.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Placa Aterosclerótica/complicações , Placa Aterosclerótica/diagnóstico , Trombose/complicações , Animais , Artérias/patologia , Modelos Animais de Doenças , Coelhos , Análise Espaço-Temporal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...