Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artif Intell Med ; 154: 102898, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38843691

RESUMO

We present a neural network framework for learning a survival model to predict a time-to-event outcome while simultaneously learning a topic model that reveals feature relationships. In particular, we model each subject as a distribution over "topics", where a topic could, for instance, correspond to an age group, a disorder, or a disease. The presence of a topic in a subject means that specific clinical features are more likely to appear for the subject. Topics encode information about related features and are learned in a supervised manner to predict a time-to-event outcome. Our framework supports combining many different topic and survival models; training the resulting joint survival-topic model readily scales to large datasets using standard neural net optimizers with minibatch gradient descent. For example, a special case is to combine LDA with a Cox model, in which case a subject's distribution over topics serves as the input feature vector to the Cox model. We explain how to address practical implementation issues that arise when applying these neural survival-supervised topic models to clinical data, including how to visualize results to assist clinical interpretation. We study the effectiveness of our proposed framework on seven clinical datasets on predicting time until death as well as hospital ICU length of stay, where we find that neural survival-supervised topic models achieve competitive accuracy with existing approaches while yielding interpretable clinical topics that explain feature relationships. Our code is available at: https://github.com/georgehc/survival-topics.

2.
Proc Mach Learn Res ; 219: 94-109, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38476630

RESUMO

Reliable extraction of temporal relations from clinical notes is a growing need in many clinical research domains. Our work introduces typed markers to the task of clinical temporal relation extraction. We demonstrate that the addition of medical entity information to clinical text as tags with context sentences then input to a transformer-based architecture can outperform more complex systems requiring feature engineering and temporal reasoning. We propose several strategies of typed marker creation that incorporate entity type information at different granularities, with extensive experiments to test their effectiveness. Our system establishes the best result on I2B2, a clinical benchmark dataset for temporal relation extraction, with a F1 at 83.5% that provides a substantial 3.3% improvement over the previous best system.

3.
Proc Mach Learn Res ; 225: 403-427, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38550276

RESUMO

We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient's data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to "data augmentation", a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.

4.
Lancet Digit Health ; 4(6): e455-e465, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35623798

RESUMO

BACKGROUND: Little is known about whether machine-learning algorithms developed to predict opioid overdose using earlier years and from a single state will perform as well when applied to other populations. We aimed to develop a machine-learning algorithm to predict 3-month risk of opioid overdose using Pennsylvania Medicaid data and externally validated it in two data sources (ie, later years of Pennsylvania Medicaid data and data from a different state). METHODS: This prognostic modelling study developed and validated a machine-learning algorithm to predict overdose in Medicaid beneficiaries with one or more opioid prescription in Pennsylvania and Arizona, USA. To predict risk of hospital or emergency department visits for overdose in the subsequent 3 months, we measured 284 potential predictors from pharmaceutical and health-care encounter claims data in 3-month periods, starting 3 months before the first opioid prescription and continuing until loss to follow-up or study end. We developed and internally validated a gradient-boosting machine algorithm to predict overdose using 2013-16 Pennsylvania Medicaid data (n=639 693). We externally validated the model using (1) 2017-18 Pennsylvania Medicaid data (n=318 585) and (2) 2015-17 Arizona Medicaid data (n=391 959). We reported several prediction performance metrics (eg, C-statistic, positive predictive value). Beneficiaries were stratified into risk-score subgroups to support clinical use. FINDINGS: A total of 8641 (1·35%) 2013-16 Pennsylvania Medicaid beneficiaries, 2705 (0·85%) 2017-18 Pennsylvania Medicaid beneficiaries, and 2410 (0·61%) 2015-17 Arizona beneficiaries had one or more overdose during the study period. C-statistics for the algorithm predicting 3-month overdoses developed from the 2013-16 Pennsylvania training dataset and validated on the 2013-16 Pennsylvania internal validation dataset, 2017-18 Pennsylvania external validation dataset, and 2015-17 Arizona external validation dataset were 0·841 (95% CI 0·835-0·847), 0·828 (0·822-0·834), and 0·817 (0·807-0·826), respectively. In external validation datasets, 71 361 (22·4%) of 318 585 2017-18 Pennsylvania beneficiaries were in high-risk subgroups (positive predictive value of 0·38-4·08%; capturing 73% of overdoses in the subsequent 3 months) and 40 041 (10%) of 391 959 2015-17 Arizona beneficiaries were in high-risk subgroups (positive predictive value of 0·19-1·97%; capturing 55% of overdoses). Lower risk subgroups in both validation datasets had few individuals (≤0·2%) with an overdose. INTERPRETATION: A machine-learning algorithm predicting opioid overdose derived from Pennsylvania Medicaid data performed well in external validation with more recent Pennsylvania data and with Arizona Medicaid data. The algorithm might be valuable for overdose risk prediction and stratification in Medicaid beneficiaries. FUNDING: National Institute of Health, National Institute on Drug Abuse, National Institute on Aging.


Assuntos
Overdose de Drogas , Overdose de Opiáceos , Algoritmos , Analgésicos Opioides , Humanos , Aprendizado de Máquina , Medicaid , Prognóstico , Estados Unidos
5.
Addiction ; 117(8): 2254-2263, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35315173

RESUMO

BACKGROUND AND AIMS: The time lag encountered when accessing health-care data is one major barrier to implementing opioid overdose prediction measures in practice. Little is known regarding how one's opioid overdose risk changes over time. We aimed to identify longitudinal patterns of individual predicted overdose risks among Medicaid beneficiaries after initiation of opioid prescriptions. DESIGN, SETTING AND PARTICIPANTS: A retrospective cohort study in Pennsylvania, USA among Pennsylvania Medicaid beneficiaries aged 18-64 years who initiated opioid prescriptions between July 2017 and September 2018 (318 585 eligible beneficiaries (mean age = 39 ± 12 years, female = 65.7%, White = 62.2% and Black = 24.9%). MEASUREMENTS: We first applied a previously developed and validated machine-learning algorithm to obtain risk scores for opioid overdose emergency room or hospital visits in 3-month intervals for each beneficiary who initiated opioid therapy, until disenrollment from Medicaid, death or the end of observation (December 2018). We performed group-based trajectory modeling to identify trajectories of these predicted overdose risk scores over time. FINDINGS: Among eligible beneficiaries, 0.61% had one or more occurrences of opioid overdose in a median follow-up of 15 months. We identified five unique opioid overdose risk trajectories: three trajectories (accounting for 92% of the cohort) had consistent overdose risk over time, including consistent low-risk (63%), consistent medium-risk (25%) and consistent high-risk (4%) groups; another two trajectories (accounting for 8%) had overdose risks that substantially changed over time, including a group that transitioned from high- to medium-risk (3%) and another group that increased from medium- to high-risk over time (5%). CONCLUSIONS: More than 90% of Medicaid beneficiaries in Pennsylvania USA with one or more opioid prescriptions had consistent, predicted opioid overdose risks over 15 months. Applying opioid prediction algorithms developed from historical data may not be a major barrier to implementation in practice for the large majority of individuals.


Assuntos
Overdose de Drogas , Overdose de Opiáceos , Adulto , Analgésicos Opioides/uso terapêutico , Overdose de Drogas/tratamento farmacológico , Overdose de Drogas/epidemiologia , Feminino , Humanos , Medicaid , Pessoa de Meia-Idade , Overdose de Opiáceos/epidemiologia , Estudos Retrospectivos
6.
AMIA Annu Symp Proc ; 2022: 1257-1266, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128459

RESUMO

With COVID-19 now pervasive, identification of high-risk individuals is crucial. Using data from a major healthcare provider in Southwestern Pennsylvania, we develop survival models predicting severe COVID-19 progression. In this endeavor, we face a tradeoff between more accurate models relying on many features and less accurate models relying on a few features aligned with clinician intuition. Complicating matters, many EHR features tend to be under-coded degrading the accuracy of smaller models. In this study we develop two sets of high-performance risk scores: (i) an unconstrained model built from all available features; and (ii) a pipeline that learns a small set of clinical concepts before training a risk predictor. Learned concepts boost performance over the corresponding features (C-index 0.858 vs. 0.844) and demonstrate improvements over (i) when evaluated out-of-sample (subsequent time periods). Our models outperform previous works (C-index 0.844-0.872 vs. 0.598-0.810).


Assuntos
COVID-19 , Humanos , Aprendizado de Máquina , Fatores de Risco , Pennsylvania
7.
Proc Conf Empir Methods Nat Lang Process ; 2022(SD): 109-120, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38476318

RESUMO

Diagnostic coding, or ICD coding, is the task of assigning diagnosis codes defined by the ICD (International Classification of Diseases) standard to patient visits based on clinical notes. The current process of manual ICD coding is time-consuming and often error-prone, which suggests the need for automatic ICD coding. However, despite the long history of automatic ICD coding, there have been no standardized frameworks for benchmarking ICD coding models. We open-source an easy-to-use tool named AnEMIC, which provides a streamlined pipeline for preprocessing, training, and evaluating for automatic ICD coding. We correct errors in preprocessing by existing works, and provide key models and weights trained on the correctly preprocessed datasets. We also provide an interactive demo performing real-time inference from custom inputs, and visualizations drawn from explainable AI to analyze the models. We hope the framework helps move the research of ICD coding forward and helps professionals explore the potential of ICD coding. The framework and the associated code are available here.

8.
Proc Mach Learn Res ; 174: 234-247, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38665367

RESUMO

Spelling correction is a particularly important problem in clinical natural language processing because of the abundant occurrence of misspellings in medical records. However, the scarcity of labeled datasets in a clinical context makes it hard to build a machine learning system for such clinical spelling correction. In this work, we present a probabilistic model of correcting misspellings based on a simple conditional independence assumption, which leads to a modular decomposition into a language model and a corruption model. With a deep character-level language model trained on a large clinical corpus, and a simple edit-based corruption model, we can build a spelling correction model with small or no real data. Experimental results show that our model significantly outperforms baselines on two healthcare spelling correction datasets.

9.
J Gen Intern Med ; 36(4): 908-915, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33481168

RESUMO

BACKGROUND: Survivors of opioid overdose have substantially increased mortality risk, although this risk is not evenly distributed across individuals. No study has focused on predicting an individual's risk of death after a nonfatal opioid overdose. OBJECTIVE: To predict risk of death after a nonfatal opioid overdose. DESIGN AND PARTICIPANTS: This retrospective cohort study included 9686 Pennsylvania Medicaid beneficiaries with an emergency department or inpatient claim for nonfatal opioid overdose in 2014-2016. The index date was the first overdose claim during this period. EXPOSURES, MAIN OUTCOME, AND MEASURES: Predictor candidates were measured in the 180 days before the index overdose. Primary outcome was 180-day all-cause mortality. Using a gradient boosting machine model, we classified beneficiaries into six subgroups according to their risk of mortality (< 25th percentile of the risk score, 25th to < 50th, 50th to < 75th, 75th to < 90th, 90th to < 98th, ≥ 98th). We then measured receipt of medication for opioid use disorder (OUD), risk mitigation interventions (e.g., prescriptions for naloxone), and prescription opioids filled in the 180 days after the index overdose, by risk subgroup. KEY RESULTS: Of eligible beneficiaries, 347 (3.6%) died within 180 days after the index overdose. The C-statistic of the mortality prediction model was 0.71. In the highest risk subgroup, the observed 180-day mortality rate was 20.3%, while in the lowest risk subgroup, it was 1.5%. Medication for OUD and risk mitigation interventions after overdose were more commonly seen in lower risk groups, while opioid prescriptions were more likely to be used in higher risk groups (both p trends < .001). CONCLUSIONS: A risk prediction model performed well for classifying mortality risk after a nonfatal opioid overdose. This prediction score can identify high-risk subgroups to target interventions to improve outcomes among overdose survivors.


Assuntos
Overdose de Drogas , Overdose de Opiáceos , Transtornos Relacionados ao Uso de Opioides , Analgésicos Opioides/uso terapêutico , Overdose de Drogas/tratamento farmacológico , Serviço Hospitalar de Emergência , Hospitais , Humanos , Transtornos Relacionados ao Uso de Opioides/tratamento farmacológico , Pennsylvania/epidemiologia , Estudos Retrospectivos , Estados Unidos/epidemiologia
10.
AMIA Annu Symp Proc ; 2021: 285-294, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35308980

RESUMO

Since the COVID-19 pandemic began, the United States's case fatality rate (CFR) has plummeted. Using national and Florida data, we unpack the drop in CFR between April and December 2020, accounting for such confounders as expanded testing, age distribution shift, and detection-to-death lags. Guided by the insight that treatment improvements in this period should correspond to decreases in hospitalization fatality rate (HFR), and using a block-bootstrapping procedure to quantify uncertainty, we find that although treatment improvements do not follow the same trajectory in Florida and nationally (with Florida undergoing a comparatively severe second peak), by December, significant improvements are observed both in Florida and nationally (at least 17% and 55% respectively). These estimates paint a more realistic picture of improvements than the drop in aggregate CFR (70.8%-91.1%). We publish a website where users can apply our analyses to selected demographics, regions, and dates of interest.


Assuntos
COVID-19 , Distribuição por Idade , COVID-19/epidemiologia , Florida/epidemiologia , Hospitalização , Humanos , Pandemias
11.
PLoS One ; 15(7): e0235981, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32678860

RESUMO

OBJECTIVE: To develop and validate a machine-learning algorithm to improve prediction of incident OUD diagnosis among Medicare beneficiaries with ≥1 opioid prescriptions. METHODS: This prognostic study included 361,527 fee-for-service Medicare beneficiaries, without cancer, filling ≥1 opioid prescriptions from 2011-2016. We randomly divided beneficiaries into training, testing, and validation samples. We measured 269 potential predictors including socio-demographics, health status, patterns of opioid use, and provider-level and regional-level factors in 3-month periods, starting from three months before initiating opioids until development of OUD, loss of follow-up or end of 2016. The primary outcome was a recorded OUD diagnosis or initiating methadone or buprenorphine for OUD as proxy of incident OUD. We applied elastic net, random forests, gradient boosting machine, and deep neural network to predict OUD in the subsequent three months. We assessed prediction performance using C-statistics and other metrics (e.g., number needed to evaluate to identify an individual with OUD [NNE]). Beneficiaries were stratified into subgroups by risk-score decile. RESULTS: The training (n = 120,474), testing (n = 120,556), and validation (n = 120,497) samples had similar characteristics (age ≥65 years = 81.1%; female = 61.3%; white = 83.5%; with disability eligibility = 25.5%; 1.5% had incident OUD). In the validation sample, the four approaches had similar prediction performances (C-statistic ranged from 0.874 to 0.882); elastic net required the fewest predictors (n = 48). Using the elastic net algorithm, individuals in the top decile of risk (15.8% [n = 19,047] of validation cohort) had a positive predictive value of 0.96%, negative predictive value of 99.7%, and NNE of 104. Nearly 70% of individuals with incident OUD were in the top two deciles (n = 37,078), having highest incident OUD (36 to 301 per 10,000 beneficiaries). Individuals in the bottom eight deciles (n = 83,419) had minimal incident OUD (3 to 28 per 10,000). CONCLUSIONS: Machine-learning algorithms improve risk prediction and risk stratification of incident OUD in Medicare beneficiaries.


Assuntos
Biologia Computacional/métodos , Planos de Pagamento por Serviço Prestado/estatística & dados numéricos , Aprendizado de Máquina , Medicare/estatística & dados numéricos , Transtornos Relacionados ao Uso de Opioides/diagnóstico , Transtornos Relacionados ao Uso de Opioides/epidemiologia , Medição de Risco/métodos , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Transtornos Relacionados ao Uso de Opioides/complicações , Prognóstico , Estados Unidos
13.
JAMA ; 321(20): 2003-2017, 2019 05 28.
Artigo em Inglês | MEDLINE | ID: mdl-31104070

RESUMO

Importance: Sepsis is a heterogeneous syndrome. Identification of distinct clinical phenotypes may allow more precise therapy and improve care. Objective: To derive sepsis phenotypes from clinical data, determine their reproducibility and correlation with host-response biomarkers and clinical outcomes, and assess the potential causal relationship with results from randomized clinical trials (RCTs). Design, Settings, and Participants: Retrospective analysis of data sets using statistical, machine learning, and simulation tools. Phenotypes were derived among 20 189 total patients (16 552 unique patients) who met Sepsis-3 criteria within 6 hours of hospital presentation at 12 Pennsylvania hospitals (2010-2012) using consensus k means clustering applied to 29 variables. Reproducibility and correlation with biological parameters and clinical outcomes were assessed in a second database (2013-2014; n = 43 086 total patients and n = 31 160 unique patients), in a prospective cohort study of sepsis due to pneumonia (n = 583), and in 3 sepsis RCTs (n = 4737). Exposures: All clinical and laboratory variables in the electronic health record. Main Outcomes and Measures: Derived phenotype (α, ß, γ, and δ) frequency, host-response biomarkers, 28-day and 365-day mortality, and RCT simulation outputs. Results: The derivation cohort included 20 189 patients with sepsis (mean age, 64 [SD, 17] years; 10 022 [50%] male; mean maximum 24-hour Sequential Organ Failure Assessment [SOFA] score, 3.9 [SD, 2.4]). The validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Of the 4 derived phenotypes, the α phenotype was the most common (n = 6625; 33%) and included patients with the lowest administration of a vasopressor; in the ß phenotype (n = 5512; 27%), patients were older and had more chronic illness and renal dysfunction; in the γ phenotype (n = 5385; 27%), patients had more inflammation and pulmonary dysfunction; and in the δ phenotype (n = 2667; 13%), patients had more liver dysfunction and septic shock. Phenotype distributions were similar in the validation cohort. There were consistent differences in biomarker patterns by phenotype. In the derivation cohort, cumulative 28-day mortality was 287 deaths of 5691 unique patients (5%) for the α phenotype; 561 of 4420 (13%) for the ß phenotype; 1031 of 4318 (24%) for the γ phenotype; and 897 of 2223 (40%) for the δ phenotype. Across all cohorts and trials, 28-day and 365-day mortality were highest among the δ phenotype vs the other 3 phenotypes (P < .001). In simulation models, the proportion of RCTs reporting benefit, harm, or no effect changed considerably (eg, varying the phenotype frequencies within an RCT of early goal-directed therapy changed the results from >33% chance of benefit to >60% chance of harm). Conclusions and Relevance: In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.


Assuntos
Sepse/classificação , Algoritmos , Biomarcadores/sangue , Análise por Conglomerados , Conjuntos de Dados como Assunto , Mortalidade Hospitalar , Humanos , Aprendizado de Máquina , Escores de Disfunção Orgânica , Fenótipo , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sepse/mortalidade , Sepse/terapia
14.
JAMA Netw Open ; 2(3): e190968, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30901048

RESUMO

Importance: Current approaches to identifying individuals at high risk for opioid overdose target many patients who are not truly at high risk. Objective: To develop and validate a machine-learning algorithm to predict opioid overdose risk among Medicare beneficiaries with at least 1 opioid prescription. Design, Setting, and Participants: A prognostic study was conducted between September 1, 2017, and December 31, 2018. Participants (n = 560 057) included fee-for-service Medicare beneficiaries without cancer who filled 1 or more opioid prescriptions from January 1, 2011, to December 31, 2015. Beneficiaries were randomly and equally divided into training, testing, and validation samples. Exposures: Potential predictors (n = 268), including sociodemographics, health status, patterns of opioid use, and practitioner-level and regional-level factors, were measured in 3-month windows, starting 3 months before initiating opioids until loss of follow-up or the end of observation. Main Outcomes and Measures: Opioid overdose episodes from inpatient and emergency department claims were identified. Multivariate logistic regression (MLR), least absolute shrinkage and selection operator-type regression (LASSO), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN) were applied to predict overdose risk in the subsequent 3 months after initiation of treatment with prescription opioids. Prediction performance was assessed using the C statistic and other metrics (eg, sensitivity, specificity, and number needed to evaluate [NNE] to identify one overdose). The Youden index was used to identify the optimized threshold of predicted score that balanced sensitivity and specificity. Results: Beneficiaries in the training (n = 186 686), testing (n = 186 685), and validation (n = 186 686) samples had similar characteristics (mean [SD] age of 68.0 [14.5] years, and approximately 63% were female, 82% were white, 35% had disabilities, 41% were dual eligible, and 0.60% had at least 1 overdose episode). In the validation sample, the DNN (C statistic = 0.91; 95% CI, 0.88-0.93) and GBM (C statistic = 0.90; 95% CI, 0.87-0.94) algorithms outperformed the LASSO (C statistic = 0.84; 95% CI, 0.80-0.89), RF (C statistic = 0.80; 95% CI, 0.75-0.84), and MLR (C statistic = 0.75; 95% CI, 0.69-0.80) methods for predicting opioid overdose. At the optimized sensitivity and specificity, DNN had a sensitivity of 92.3%, specificity of 75.7%, NNE of 542, positive predictive value of 0.18%, and negative predictive value of 99.9%. The DNN classified patients into low-risk (76.2% [142 180] of the cohort), medium-risk (18.6% [34 579] of the cohort), and high-risk (5.2% [9747] of the cohort) subgroups, with only 1 in 10 000 in the low-risk subgroup having an overdose episode. More than 90% of overdose episodes occurred in the high-risk and medium-risk subgroups, although positive predictive values were low, given the rare overdose outcome. Conclusions and Relevance: Machine-learning algorithms appear to perform well for risk prediction and stratification of opioid overdose, especially in identifying low-risk subgroups that have minimal risk of overdose.


Assuntos
Algoritmos , Analgésicos Opioides/efeitos adversos , Overdose de Drogas/epidemiologia , Aprendizado de Máquina , Medição de Risco/métodos , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Feminino , Humanos , Masculino , Aplicações da Informática Médica , Medicare , Pessoa de Meia-Idade , Prescrições , Estados Unidos
15.
AMIA Annu Symp Proc ; 2019: 408-417, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308834

RESUMO

We consider the task of producing a useful clustering of healthcare providers from their clinical action signature- their drug, procedure, and billing codes. Because high-dimensional sparse count vectors are challenging to cluster, we develop a novel autoencoder framework to address this task. Our solution creates a low-dimensional embedded representation of the high-dimensional space that preserves angular relationships and assigns examples to clusters while optimizing the quality of this clustering. Our method is able to find a better clustering than under a two-step alternative, e.g., projected K means/medoids, where a representation is learned and then clustering is applied to the representation. We demonstrate our method's characteristics through quantitative and qualitative analysis of real and simulated data, including in several real-world healthcare case studies. Finally, we develop a tool to enhance exploratory analysis of providers based on their clinical behaviors.


Assuntos
Análise por Conglomerados , Simulação por Computador , Pessoal de Saúde , Medicare , Idoso , Algoritmos , Humanos , Reconhecimento Automatizado de Padrão , Estados Unidos
16.
Adv Neural Inf Process Syst ; 2012: 467-475, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-25284967

RESUMO

Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

17.
Proc Innov Appl Artif Intell Conf ; 2012: 2341-2347, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-25360347

RESUMO

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.

18.
Int J Health Geogr ; 6: 12, 2007 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-17367520

RESUMO

BACKGROUND: Geocoding methods vary among spatial epidemiology studies. Errors in the geocoding process and differential match rates may reduce study validity. We compared two geocoding methods using 8,157 Washington State addresses. The multi-stage geocoding method implemented by the state health department used a sequence of local and national reference files. The single-stage method used a single national reference file. For each address geocoded by both methods, we measured the distance between the locations assigned by each method. Area-level characteristics were collected from census data, and modeled as predictors of the discordance between geocoded address coordinates. RESULTS: The multi-stage method had a higher match rate than the single-stage method: 99% versus 95%. Of 7,686 addresses were geocoded by both methods, 96% were geocoded to the same census tract by both methods and 98% were geocoded to locations within 1 km of each other by the two methods. The distance between geocoded coordinates for the same address was higher in sparsely populated and low poverty areas, and counties with local reference files. CONCLUSION: The multi-stage geocoding method had a higher match rate than the single-stage method. An examination of differences in the location assigned to the same address suggested that study results may be most sensitive to the choice of geocoding method in sparsely populated or low-poverty areas.


Assuntos
Métodos Epidemiológicos , Sistemas de Informação Geográfica/estatística & dados numéricos , Sistemas de Informação Geográfica/normas , Topografia Médica/estatística & dados numéricos , Topografia Médica/normas , Censos , Áreas de Pobreza , Reprodutibilidade dos Testes , Washington
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...