Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-37555559

RESUMO

Objective: To assemble and characterize an electronic health record (EHR) dataset for a large cohort of US military Veterans diagnosed with ALS (Amyotrophic Lateral Sclerosis). Methods: An EHR dataset for 19,662 Veterans diagnosed with ALS between January 1, 2000 to December 31, 2020 was compiled from the Veterans Health Administration (VHA) EHR database by a query for ICD9 diagnosis (335.20) or ICD10 diagnosis (G12.21) for Amyotrophic Lateral Sclerosis. Results: The cohort is predominantly male (98.94%) and white (72.37%) with a median age at disease onset of 68 years and median survival from the date of diagnosis of 590 days. With the designation of ALS as a compensable illness in 2009, there was a subsequent increase in the number of Veterans diagnosed per year in the VHA, but no change in median survival. The cohort included a greater-than-expected proportion of individuals whose branch of service at the time of separation was the Army. Conclusions: The composition of the cohort reflects the VHA population who are at greatest risk for ALS. The greater than expected proportion of individuals whose branch of service at the time of separation was the Army suggests the possibility of a branch-specific risk factor for ALS.

2.
Stat Methods Med Res ; 31(12): 2383-2399, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36039541

RESUMO

Continuous-time hidden Markov models are an attractive approach for disease modeling because they are explainable and capable of handling both irregularly sampled, skewed and sparse data arising from real-world medical practice, in particular to screening data with extensive followup. Most applications in this context consider time-homogeneous models due to their relative computational simplicity. However, the time homogeneous assumption is too strong to accurately model the natural history of many diseases including cancer. Moreover, cancer risk across the population is not homogeneous either, since exposure to disease risk factors can vary considerably between individuals. This is important when analyzing longitudinal datasets and different birth cohorts. We model the heterogeneity of disease progression and regression using piece-wise constant intensity functions and model the heterogeneity of risks in the population using a latent mixture structure. Different submodels under the mixture structure employ the same types of Markov states reflecting disease progression and allowing both clinical interpretation and model parsimony. We also consider flexible observational models dealing with model over-dispersion in real data. An efficient, scalable Expectation-Maximization algorithm for inference is proposed with the theoretical guaranteed convergence property. We demonstrate our method's superior performance compared to other state-of-the-art methods using synthetic data and a real-world cervical cancer screening dataset from the Cancer Registry of Norway. Moreover, we present two model-based risk stratification methods that identify the risk levels of individuals.


Assuntos
Detecção Precoce de Câncer , Neoplasias do Colo do Útero , Feminino , Humanos , Cadeias de Markov , Modelos Estatísticos , Neoplasias do Colo do Útero/diagnóstico , Algoritmos , Progressão da Doença
3.
Cancer Med ; 11(11): 2204-2215, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35261195

RESUMO

BACKGROUND: The interaction between cancer diagnoses and COVID-19 infection and outcomes is unclear. We leveraged a state-wide, multi-institutional database to assess cancer-related risk factors for poor COVID-19 outcomes. METHODS: We conducted a retrospective cohort study using the University of California Health COVID Research Dataset, which includes electronic health data of patients tested for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) at 17 California medical centers. We identified adults tested for SARS-CoV-2 from 2/1/2020-12/31/2020 and selected a cohort of patients with cancer. We obtained demographic, clinical, cancer type, and antineoplastic therapy data. The primary outcome was hospitalization within 30d after the first positive SARS-CoV-2 test. Secondary outcomes were SARS-CoV-2 positivity and severe COVID-19 (intensive care, mechanical ventilation, or death within 30d after the first positive test). We used multivariable logistic regression to identify cancer-related factors associated with outcomes. RESULTS: We identified 409,462 patients undergoing SARS-CoV-2 testing. Of 49,918 patients with cancer, 1781 (3.6%) tested positive. Patients with cancer were less likely to test positive (RR 0.70, 95% CI: 0.67-0.74, p < 0.001). Among the 1781 SARS-CoV-2-positive patients with cancer, BCR/ABL-negative myeloproliferative neoplasms (RR 2.15, 95% CI: 1.25-3.41, p = 0.007), venetoclax (RR 2.96, 95% CI: 1.14-5.66, p = 0.028), and methotrexate (RR 2.72, 95% CI: 1.10-5.19, p = 0.032) were associated with greater hospitalization risk. Cancer and therapy types were not associated with severe COVID-19. CONCLUSIONS: In this large, diverse cohort, cancer was associated with a decreased risk of SARS-CoV-2 positivity. Patients with BCR/ABL-negative myeloproliferative neoplasm or receiving methotrexate or venetoclax may be at increased risk of hospitalization following SARS-CoV-2 infection. Mechanistic and comparative studies are needed to validate findings.


Assuntos
COVID-19 , Neoplasias , Adulto , COVID-19/epidemiologia , Teste para COVID-19 , Hospitalização , Humanos , Metotrexato , Neoplasias/epidemiologia , Estudos Retrospectivos , SARS-CoV-2
4.
J Am Med Inform Assoc ; 29(5): 864-872, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-35137149

RESUMO

OBJECTIVE: The study sought to investigate the disease state-dependent risk profiles of patient demographics and medical comorbidities associated with adverse outcomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. MATERIALS AND METHODS: A covariate-dependent, continuous-time hidden Markov model with 4 states (moderate, severe, discharged, and deceased) was used to model the dynamic progression of COVID-19 during the course of hospitalization. All model parameters were estimated using the electronic health records of 1362 patients from ProMedica Health System admitted between March 20, 2020 and December 29, 2020 with a positive nasopharyngeal PCR test for SARS-CoV-2. Demographic characteristics, comorbidities, vital signs, and laboratory test results were retrospectively evaluated to infer a patient's clinical progression. RESULTS: The association between patient-level covariates and risk of progression was found to be disease state dependent. Specifically, while being male, being Black or having a medical comorbidity were all associated with an increased risk of progressing from the moderate disease state to the severe disease state, these same factors were associated with a decreased risk of progressing from the severe disease state to the deceased state. DISCUSSION: Recent studies have not included analyses of the temporal progression of COVID-19, making the current study a unique modeling-based approach to understand the dynamics of COVID-19 in hospitalized patients. CONCLUSION: Dynamic risk stratification models have the potential to improve clinical outcomes not only in COVID-19, but also in a myriad of other acute and chronic diseases that, to date, have largely been assessed only by static modeling techniques.


Assuntos
COVID-19 , Comorbidade , Feminino , Hospitalização , Humanos , Masculino , Estudos Retrospectivos , Fatores de Risco , SARS-CoV-2
5.
Sci Rep ; 11(1): 19543, 2021 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-34599200

RESUMO

The combination of machine learning (ML) and electronic health records (EHR) data may be able to improve outcomes of hospitalized COVID-19 patients through improved risk stratification and patient outcome prediction. However, in resource constrained environments the clinical utility of such data-driven predictive tools may be limited by the cost or unavailability of certain laboratory tests. We leveraged EHR data to develop an ML-based tool for predicting adverse outcomes that optimizes clinical utility under a given cost structure. We further gained insights into the decision-making process of the ML models through an explainable AI tool. This cohort study was performed using deidentified EHR data from COVID-19 patients from ProMedica Health System in northwest Ohio and southeastern Michigan. We tested the performance of various ML approaches for predicting either increasing ventilatory support or mortality. We performed post hoc analysis to obtain optimal feature sets under various budget constraints. We demonstrate that it is possible to achieve a significant reduction in cost at the expense of a small reduction in predictive performance. For example, when predicting ventilation, it is possible to achieve a 43% reduction in cost with only a 3% reduction in performance. Similarly, when predicting mortality, it is possible to achieve a 50% reduction in cost with only a 1% reduction in performance. This study presents a quick, accurate, and cost-effective method to evaluate risk of deterioration for patients with SARS-CoV-2 infection at the time of clinical evaluation.


Assuntos
Orçamentos , COVID-19/patologia , COVID-19/virologia , Aprendizado de Máquina , Avaliação de Resultados em Cuidados de Saúde , SARS-CoV-2/isolamento & purificação , Humanos
6.
J Biomed Inform ; 117: 103698, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33617985

RESUMO

Advances in the modeling and analysis of electronic health records (EHR) have the potential to improve patient risk stratification, leading to better patient outcomes. The modeling of complex temporal relations across the multiple clinical variables inherent in EHR data is largely unexplored. Existing approaches to modeling EHR data often lack the flexibility to handle time-varying correlations across multiple clinical variables, or they are too complex for clinical interpretation. Therefore, we propose a novel nonstationary multivariate Gaussian process model for EHR data to address the aforementioned drawbacks of existing methodologies. Our proposed model is able to capture time-varying scale, correlation and smoothness across multiple clinical variables. We also provide details on two inference approaches: Maximum a posteriori and Hamilton Monte Carlo. Our model is validated on synthetic data and then we demonstrate its effectiveness on EHR data from Kaiser Permanente Division of Research (KPDOR). Finally, we use the KPDOR EHR data to investigate the relationships between a clinical patient risk metric and the latent processes of our proposed model and demonstrate statistically significant correlations between these entities.


Assuntos
Registros Eletrônicos de Saúde , Humanos , Distribuição Normal
7.
PLoS One ; 15(11): e0241225, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33196642

RESUMO

Oncology is a highly siloed field of research in which sub-disciplinary specialization has limited the amount of information shared between researchers of distinct cancer types. This can be attributed to legitimate differences in the physiology and carcinogenesis of cancers affecting distinct anatomical sites. However, underlying processes that are shared across seemingly disparate cancers probably affect prognosis. The objective of the current study is to investigate whether multitask learning improves 5-year survival cancer patient survival prediction by leveraging information across anatomically distinct HPV related cancers. Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program database. The study cohort consisted of 29,768 primary cancer cases diagnosed in the United States between 2004 and 2015. Ten different cancer diagnoses were selected, all with a known association with HPV risk. In the analysis, the cancer diagnoses were categorized into three distinct topography groups of varying specificity. The most specific topography grouping consisted of 10 original cancer diagnoses differentiated by the first two digits of the ICD-O-3 topography code. The second topography grouping consisted of cancer diagnoses categorized into six distinct organ groups. Finally, the third topography grouping consisted of just two groups, head-neck cancers and ano-genital cancers. The tasks were to predict 5-year survival for patients within the different topography groups using 14 predictive features which were selected among descriptive variables available in the SEER database. The information from the predictive features was shared between tasks in three different ways, resulting in three distinct predictive models: 1) Information was not shared between patients assigned to different tasks (single task learning); 2) Information was shared between all patients, regardless of task (pooled model); 3) Only relevant information was shared between patients grouped to different tasks (multitask learning). Prediction performance was evaluated with Brier scores. All three models were evaluated against one another on each of the three distinct topography-defined tasks. The results showed that multitask classifiers achieved relative improvement for the majority of the scenarios studied compared to single task learning and pooled baseline methods. In this study, we have demonstrated that sharing information among anatomically distinct cancer types can lead to improved predictive survival models.


Assuntos
Aprendizagem , Comportamento Multitarefa , Neoplasias/mortalidade , Neoplasias/virologia , Infecções por Papillomavirus/mortalidade , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Programa de SEER , Tamanho da Amostra , Análise de Sobrevida , Adulto Jovem
8.
Stat Med ; 39(25): 3569-3590, 2020 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-32854166

RESUMO

The Cancer Registry of Norway has been administrating a national cervical cancer screening program since 1992 by coordinating triennial cytology exam screenings for the female population between 25 and 69 years of age. Up to 80% of cancers are prevented through mass screening, but this comes at the expense of considerable screening activity and leads to overtreatment of clinically asymptomatic precancers. In this article, we present a continuous-time, time-inhomogeneous hidden Markov model which was developed to understand the screening process and cervical cancer carcinogenesis in detail. By leveraging 1.7 million individual's multivariate time-series of medical exams performed over a 25-year period, we simultaneously estimate all model parameters. We show that an age-dependent model reflects the Norwegian screening program by comparing empirical survival curves from observed registry data and data simulated from the proposed model. The model can be generalized to include more detailed individual-level covariates as well as new types of screening exams. By utilizing individual screening histories and covariate data, the proposed model shows potential for improving strategies for cancer screening programs by personalizing recommended screening intervals.


Assuntos
Infecções por Papillomavirus , Neoplasias do Colo do Útero , Análise Custo-Benefício , Detecção Precoce de Câncer , Feminino , Humanos , Cadeias de Markov , Programas de Rastreamento , Noruega/epidemiologia , Neoplasias do Colo do Útero/diagnóstico
9.
BMC Med Res Methodol ; 20(1): 108, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32381039

RESUMO

BACKGROUND: Machine learning (ML) has made a significant impact in medicine and cancer research; however, its impact in these areas has been undeniably slower and more limited than in other application domains. A major reason for this has been the lack of availability of patient data to the broader ML research community, in large part due to patient privacy protection concerns. High-quality, realistic, synthetic datasets can be leveraged to accelerate methodological developments in medicine. By and large, medical data is high dimensional and often categorical. These characteristics pose multiple modeling challenges. METHODS: In this paper, we evaluate three classes of synthetic data generation approaches; probabilistic models, classification-based imputation models, and generative adversarial neural networks. Metrics for evaluating the quality of the generated synthetic datasets are presented and discussed. RESULTS: While the results and discussions are broadly applicable to medical data, for demonstration purposes we generate synthetic datasets for cancer based on the publicly available cancer registry data from the Surveillance Epidemiology and End Results (SEER) program. Specifically, our cohort consists of breast, respiratory, and non-solid cancer cases diagnosed between 2010 and 2015, which includes over 360,000 individual cases. CONCLUSIONS: We discuss the trade-offs of the different methods and metrics, providing guidance on considerations for the generation and usage of medical synthetic data.


Assuntos
Aprendizado de Máquina , Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Redes Neurais de Computação
10.
J Biomed Inform ; 100S: 100059, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-34384572

RESUMO

Multitask learning (MTL) leverages commonalities across related tasks with the aim of improving individual task performance. A key modeling choice in designing MTL models is the structure of the tasks' relatedness, which may not be known. Here we propose a Bayesian multitask learning model that is able to infer the task relationship structure directly from the data. We present two variations of the model in terms of a priori information of task relatedness. First, a diffuse Wishart prior is placed on a task precision matrix so that all tasks are assumed to be equally related a priori. Second, a Bayesian graphical LASSO prior is used on the task precision matrix to impose sparsity in the task relatedness. Motivated by machine learning applications in the biomedical domain, we emphasize interpretability and uncertainty quantification in our models. To encourage model interpretability, linear mappings from the shared input spaces to task-dependent output spaces are used. To encourage uncertainty quantification, conjugate priors are used so that full posterior inference is possible. Using synthetic data, we show that our model is able to recover the underlying task relationships as well as features jointly relevant for all tasks. We demonstrate the utility of our model on three distinct biomedical applications: Alzheimer's disease progression, Parkinson's disease assessment, and cervical cancer screening compliance. We show that our model outperforms Single Task (STL) models in terms of predictive performance, and performs better than existing MTL methods for the majority of the scenarios.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...