Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Genet Epidemiol ; 44(7): 778-784, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32677164

RESUMEN

Family history and body mass index (BMI) are well-known risk factors for colorectal cancer (CRC), however, their joint effects are not well described. Using linked data for genealogy, self-reported height and weight from driver's licenses, and the Utah Surveillance, Epidemiology, and End-Results cancer registry, we found that an increasing number of first-degree relatives (FDR) with CRC is associated with higher standardized incidence ratio (SIR) for overweight/obese probands but not for under/normal weight probands. For probands with two CRC-affected FDRs, the SIR = 1.91 (95% CI [0.52, 4.89]) for under/normal weight probands and SIR = 4.31 (95% CI [2.46, 7.00]) for overweight/obese probands. In the absence of CRC-affected FDRs, any number of CRC-affected SDRs did not significantly increase CRC risk for under/normal weight probands, but for overweight/obese probands with at least three CRC-affected SDRs the SIR = 2.68 (95% CI [1.29, 4.93]). In the absence of CRC-affected FDRs and SDRs, any number of CRC-affected third-degree relatives (TDRs) did not increase risk in under/normal weight probands, but significantly elevated risk for overweight/obese probands with at least two CRC-affected TDRs was observed; SIR = 1.32 (95% CI [1.04, 1.65]). For nonsyndromic CRC, maximum midlife BMI affects risk based on family history and should be taken into account for CRC risk communication when possible.


Asunto(s)
Índice de Masa Corporal , Neoplasias Colorrectales/epidemiología , Anamnesis , Obesidad/epidemiología , Linaje , Adulto , Anciano , Neoplasias Colorrectales/patología , Familia , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Sistema de Registros , Factores de Riesgo , Utah/epidemiología
2.
J Biomed Inform ; 111: 103565, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32980530

RESUMEN

OBJECTIVE: To develop an effective and scalable individual-level patient cost prediction method by automatically learning hidden temporal patterns from multivariate time series data in patient insurance claims using a convolutional neural network (CNN) architecture. METHODS: We used three years of medical and pharmacy claims data from 2013 to 2016 from a healthcare insurer, where data from the first two years were used to build the model to predict costs in the third year. The data consisted of the multivariate time series of cost, visit and medical features that were shaped as images of patients' health status (i.e., matrices with time windows on one dimension and the medical, visit and cost features on the other dimension). Patients' multivariate time series images were given to a CNN method with a proposed architecture. After hyper-parameter tuning, the proposed architecture consisted of three building blocks of convolution and pooling layers with an LReLU activation function and a customized kernel size at each layer for healthcare data. The proposed CNN learned temporal patterns became inputs to a fully connected layer. We benchmarked the proposed method against three other methods: (1) a spike temporal pattern detection method, as the most accurate method for healthcare cost prediction described to date in the literature; (2) a symbolic temporal pattern detection method, as the most common approach for leveraging healthcare temporal data; and (3) the most commonly used CNN architectures for image pattern detection (i.e., AlexNet, VGGNet and ResNet) (via transfer learning). Moreover, we assessed the contribution of each type of data (i.e., cost, visit and medical). Finally, we externally validated the proposed method against a separate cohort of patients. All prediction performances were measured in terms of mean absolute percentage error (MAPE). RESULTS: The proposed CNN configuration outperformed the spike temporal pattern detection and symbolic temporal pattern detection methods with a MAPE of 1.67 versus 2.02 and 3.66, respectively (p < 0.01). The proposed CNN outperformed ResNet, AlexNet and VGGNet with MAPEs of 4.59, 4.85 and 5.06, respectively (p < 0.01). Removing medical, visit and cost features resulted in MAPEs of 1.98, 1.91 and 2.04, respectively (p < 0.01). CONCLUSIONS: Feature learning through the proposed CNN configuration significantly improved individual-level healthcare cost prediction. The proposed CNN was able to outperform temporal pattern detection methods that look for a pre-defined set of pattern shapes, since it is capable of extracting a variable number of patterns with various shapes. Temporal patterns learned from medical, visit and cost data made significant contributions to the prediction performance. Hyper-parameter tuning showed that considering three-month data patterns has the highest prediction accuracy. Our results showed that patients' images extracted from multivariate time series data are different from regular images, and hence require unique designs of CNN architectures. The proposed method for converting multivariate time series data of patients into images and tuning them for convolutional learning could be applied in many other healthcare applications with multivariate time series data.


Asunto(s)
Costos de la Atención en Salud , Redes Neurales de la Computación , Estudios de Cohortes , Humanos
3.
J Biomed Inform ; 89: 1-10, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30468912

RESUMEN

OBJECTIVES: Finding recent clinical studies that warrant changes in clinical practice ("high impact" clinical studies) in a timely manner is very challenging. We investigated a machine learning approach to find recent studies with high clinical impact to support clinical decision making and literature surveillance. METHODS: To identify recent studies, we developed our classification model using time-agnostic features that are available as soon as an article is indexed in PubMed®, such as journal impact factor, author count, and study sample size. Using a gold standard of 541 high impact treatment studies referenced in 11 disease management guidelines, we tested the following null hypotheses: (1) the high impact classifier with time-agnostic features (HI-TA) performs equivalently to PubMed's Best Match sort and a MeSH-based Naïve Bayes classifier; and (2) HI-TA performs equivalently to the high impact classifier with both time-agnostic and time-sensitive features (HI-TS) enabled in a previous study. The primary outcome for both hypotheses was mean top 20 precision. RESULTS: The differences in mean top 20 precision between HI-TA and three baselines (PubMed's Best Match, a MeSH-based Naïve Bayes classifier, and HI-TS) were not statistically significant (12% vs. 3%, p = 0.101; 12% vs. 11%, p = 0.720; 12% vs. 25%, p = 0.094, respectively). Recall of HI-TA was low (7%). CONCLUSION: HI-TA had equivalent performance to state-of-the-art approaches that depend on time-sensitive features. With the advantage of relying only on time-agnostic features, the proposed approach can be used as an adjunct to help clinicians identify recent high impact clinical studies to support clinical decision-making. However, low recall limits the use of HI-TA for literature surveillance.


Asunto(s)
Toma de Decisiones Clínicas , Aprendizaje Automático , PubMed , Publicaciones/clasificación , Teorema de Bayes
4.
J Biomed Inform ; 91: 103113, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30738188

RESUMEN

OBJECTIVE: To design and assess a method to leverage individuals' temporal data for predicting their healthcare cost. To achieve this goal, we first used patients' temporal data in their fine-grain form as opposed to coarse-grain form. Second, we devised novel spike detection features to extract temporal patterns that improve the performance of cost prediction. Third, we evaluated the effectiveness of different types of temporal features based on cost information, visit information and medical information for the prediction task. MATERIALS AND METHODS: We used three years of medical and pharmacy claims data from 2013 to 2016 from a healthcare insurer, where the first two years were used to build the model to predict the costs in the third year. To prepare the data for modeling and prediction, the time series data of cost, visit and medical information were extracted in the form of fine-grain features (i.e., segmenting each time series into a sequence of consecutive windows and representing each window by various statistics such as sum). Then, temporal patterns of the time series were extracted and added to fine-grain features using a novel set of spike detection features (i.e., the fluctuation of data points). Gradient Boosting was applied on the final set of extracted features. Moreover, the contribution of each type of data (i.e., cost, visit and medical) was assessed. We benchmarked the proposed predictors against extant methods including those that used coarse-grain features which represent each time series with various statistics such as sum and the most recent portion of the values in the entire series. All prediction performances were measured in terms of Mean Absolute Percentage Error (MAPE). RESULTS: Gradient Boosting applied on fine-grain predictors outperformed coarse-grain predictors with a MAPE of 3.02 versus 8.14 (p < 0.01). Enhancing the fine-grain features with the temporal pattern extraction features (i.e., spike detection features) further improved the MAPE to 2.04 (p < 0.01). Removing cost, visit and medical status data resulted in MAPEs of 10.24, 2.22 and 2.07 respectively (p < 0.01 for the first two comparisons and p = 0.63 for the third comparison). CONCLUSIONS: Leveraging fine-grain temporal patterns for healthcare cost prediction significantly improves prediction performance. Enhancing fine-grain features with extraction of temporal cost and visit patterns significantly improved the performance. However, medical features did not have a significant effect on prediction performance. Gradient Boosting outperformed all other prediction models.


Asunto(s)
Costos de la Atención en Salud/tendencias , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Niño , Preescolar , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Estados Unidos , Adulto Joven
5.
BMC Med Inform Decis Mak ; 14: 41, 2014 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-24886637

RESUMEN

BACKGROUND: The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. METHODS: Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. RESULTS: The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. CONCLUSION: The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.


Asunto(s)
Insuficiencia Cardíaca/terapia , Hospitalización , Modelos Estadísticos , Centros Médicos Académicos , Humanos , Readmisión del Paciente , Valor Predictivo de las Pruebas , Reproducibilidad de los Resultados , Factores de Riesgo , Atención Terciaria de Salud
6.
Res Sq ; 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38883755

RESUMEN

Introduction: Clinical notes, biomarkers, and neuroimaging have been proven valuable in dementia prediction models. Whether commonly available structured clinical data can predict dementia is an emerging area of research. We aimed to predict Alzheimer's disease (AD) and Alzheimer's disease related dementias (ADRD) in a well-phenotyped, population-based cohort using a machine learning approach. Methods: Administrative healthcare data (k=163 diagnostic features), in addition to Census/vital record sociodemographic data (k = 6 features), were linked to the Cache County Study (CCS, 1995-2008). Results: Among successfully linked UPDB-CCS participants (n=4206), 522 (12.4%) had incident AD/ADRD as per the CCS "gold standard" assessments. Random Forest models, with a 1-year prediction window, achieved the best performance with an Area Under the Curve (AUC) of 0.67. Accuracy declined for dementia subtypes: AD/ADRD (AUC = 0.65); ADRD (AUC = 0.49). DISCUSSION: Commonly available structured clinical data (without labs, notes, or prescription information) demonstrate modest ability to predict AD/ADRD, corroborated by prior research.

7.
PLoS One ; 18(5): e0284622, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37200277

RESUMEN

Sudden death related to hypoglycemia is thought to be due to cardiac arrhythmias. A clearer understanding of the cardiac changes associated with hypoglycemia is needed to reduce mortality. The objective of this work was to identify distinct patterns of electrocardiogram heartbeat changes that correlated with glycemic level, diabetes status, and mortality using a rodent model. Electrocardiogram and glucose measurements were collected from 54 diabetic and 37 non-diabetic rats undergoing insulin-induced hypoglycemic clamps. Shape-based unsupervised clustering was performed to identify distinct clusters of electrocardiogram heartbeats, and clustering performance was assessed using internal evaluation metrics. Clusters were evaluated by experimental conditions of diabetes status, glycemic level, and death status. Overall, shape-based unsupervised clustering identified 10 clusters of ECG heartbeats across multiple internal evaluation metrics. Several clusters demonstrating normal ECG morphology were specific to hypoglycemia conditions (Clusters 3, 5, and 8), non-diabetic rats (Cluster 4), or were generalized among all experimental conditions (Cluster 1). In contrast, clusters demonstrating QT prolongation alone or a combination of QT, PR, and QRS prolongation were specific to severe hypoglycemia experimental conditions and were stratified heartbeats by non-diabetic (Clusters 2 and 6) or diabetic status (Clusters 9 and 10). One cluster demonstrated an arrthymogenic waveform with premature ventricular contractions and was specific to heartbeats from severe hypoglycemia conditions (Cluster 7). Overall, this study provides the first data-driven characterization of ECG heartbeats in a rodent model of diabetes during hypoglycemia.


Asunto(s)
Diabetes Mellitus Tipo 1 , Hipoglucemia , Complejos Prematuros Ventriculares , Ratas , Animales , Diabetes Mellitus Tipo 1/complicaciones , Roedores , Hipoglucemia/inducido químicamente , Electrocardiografía , Análisis por Conglomerados
8.
Innov Aging ; 7(3): igad023, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37179657

RESUMEN

Background and Objectives: Older adult multimorbidity trajectories are helpful for understanding the current and future health patterns of aging populations. The construction of multimorbidity trajectories from comorbidity index scores will help inform public health and clinical interventions targeting those individuals that are on unhealthy trajectories. Investigators have used many different techniques when creating multimorbidity trajectories in prior literature, and no standard way has emerged. This study compares and contrasts multimorbidity trajectories constructed from various methods. Research Design and Methods: We describe the difference between aging trajectories constructed with the Charlson Comorbidity Index (CCI) and Elixhauser Comorbidity Index (ECI). We also explore the differences between acute (single-year) and chronic (cumulative) derivations of CCI and ECI scores. Social determinants of health can affect disease burden over time; thus, our models include income, race/ethnicity, and sex differences. Results: We use group-based trajectory modeling (GBTM) to estimate multimorbidity trajectories for 86,909 individuals aged 66-75 in 1992 using Medicare claims data collected over the following 21 years. We identify low-chronic disease and high-chronic disease trajectories in all 8 generated trajectory models. Additionally, all 8 models satisfied prior established statistical diagnostic criteria for well-performing GBTM models. Discussion and Implications: Clinicians may use these trajectories to identify patients on an unhealthy path and prompt a possible intervention that may shift the patient to a healthier trajectory.

9.
J Am Med Inform Assoc ; 29(5): 891-899, 2022 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-34990507

RESUMEN

OBJECTIVE: To evaluate the potential for machine learning to predict medication alerts that might be ignored by a user, and intelligently filter out those alerts from the user's view. MATERIALS AND METHODS: We identified features (eg, patient and provider characteristics) proposed to modulate user responses to medication alerts through the literature; these features were then refined through expert review. Models were developed using rule-based and machine learning techniques (logistic regression, random forest, support vector machine, neural network, and LightGBM). We collected log data on alerts shown to users throughout 2019 at University of Utah Health. We sought to maximize precision while maintaining a false-negative rate <0.01, a threshold predefined through discussion with physicians and pharmacists. We developed models while maintaining a sensitivity of 0.99. Two null hypotheses were developed: H1-there is no difference in precision among prediction models; and H2-the removal of any feature category does not change precision. RESULTS: A total of 3,481,634 medication alerts with 751 features were evaluated. With sensitivity fixed at 0.99, LightGBM achieved the highest precision of 0.192 and less than 0.01 for the pre-defined maximal false-negative rate by subject-matter experts (H1) (P < 0.001). This model could reduce alert volume by 54.1%. We removed different combinations of features (H2) and found that not all features significantly contributed to precision. Removing medication order features (eg, dosage) most significantly decreased precision (-0.147, P = 0.001). CONCLUSIONS: Machine learning potentially enables the intelligent filtering of medication alerts.


Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Sistemas de Entrada de Órdenes Médicas , Humanos , Aprendizaje Automático , Errores de Medicación/prevención & control , Farmacéuticos
10.
J Biomed Inform ; 44(6): 1068-75, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21856440

RESUMEN

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (Probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser.


Asunto(s)
Sistemas de Medicación en Hospital , Procesamiento de Lenguaje Natural , Semántica , Bases de Datos Factuales , Probabilidad , Terminología como Asunto
11.
Fed Pract ; 38(1): 15-19, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33574644

RESUMEN

INTRODUCTION: Recently, numerous studies have linked social determinants of health (SDoH) with clinical outcomes. While this association is well known, the interfacility variability of these risk favors within the Veterans Health Administration (VHA) is not known. Such information could be useful to the VHA for resource and funding allocation. The aim of this study is to explore the interfacility variability of 5 SDoH within the VHA. METHODS: In a cohort of patients (aged ≥ 65 years) hospitalized at VHA acute care facilities with either acute myocardial infarction (AMI), heart failure (HF), or pneumonia in 2012, we assessed (1) the proportion of patients with any of the following five documented SDoH: lives alone, marginal housing, alcohol use disorder, substance use disorder, and use of substance use services, using administrative diagnosis codes and clinic stop codes; and (2) the documented facility-level variability of these SDoH. To examine whether variability was due to regional coding differences, we assessed the variation of living alone using a validated natural language processing (NLP) algorithm. RESULTS: The proportion of veterans admitted for AMI, HF, and pneumonia with SDoH was low. Across all 3 conditions, lives alone was the most common SDoH (2.2% [interquartile range (IQR), 0.7-4.7]), followed by substance use disorder (1.3% [IQR, 0.5-2.1]), and use of substance use services (1.2% [IQR, 0.6-1.8]). Using NLP, the proportion of hospitalized veterans with lives alone was higher for HF (14.4% vs 2.0%, P < .01), pneumonia (11% vs 1.9%, P < .01), and AMI (10.2% vs 1.4%, P < .01) compared with International Classification of Diseases, Ninth Edition codes. Interfacility variability was noted with both administrative and NLP extraction methods. CONCLUSIONS: The presence of SDoH in administrative data among patients hospitalized for common medical issues is low and variable across VHA facilities. Significant facility-level variation of 5 SDoH was present regardless of extraction method.

12.
J Clin Transl Sci ; 5(1): e42, 2020 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-33948264

RESUMEN

INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations ('yes' or 'no'). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial.

13.
JMIR Med Inform ; 8(3): e14272, 2020 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-32181753

RESUMEN

BACKGROUND: More than 20% of patients admitted to the intensive care unit (ICU) develop an adverse event (AE). No previous study has leveraged patients' data to extract the temporal features using their structural temporal patterns, that is, trends. OBJECTIVE: This study aimed to improve AE prediction methods by using structural temporal pattern detection that captures global and local temporal trends and to demonstrate these improvements in the detection of acute kidney injury (AKI). METHODS: Using the Medical Information Mart for Intensive Care dataset, containing 22,542 patients, we extracted both global and local trends using structural pattern detection methods to predict AKI (ie, binary prediction). Classifiers were built on 17 input features consisting of vital signs and laboratory test results using state-of-the-art models; the optimal classifier was selected for comparisons with previous approaches. The classifier with structural pattern detection features was compared with two baseline classifiers that used different temporal feature extraction approaches commonly used in the literature: (1) symbolic temporal pattern detection, which is the most common approach for multivariate time series classification; and (2) the last recorded value before the prediction point, which is the most common approach to extract temporal data in the AKI prediction literature. Moreover, we assessed the individual contribution of global and local trends. Classifier performance was measured in terms of accuracy (primary outcome), area under the curve, and F-measure. For all experiments, we employed 20-fold cross-validation. RESULTS: Random forest was the best classifier using structural temporal pattern detection. The accuracy of the classifier with local and global trend features was significantly higher than that while using symbolic temporal pattern detection and the last recorded value (81.3% vs 70.6% vs 58.1%; P<.001). Excluding local or global features reduced the accuracy to 74.4% or 78.1%, respectively (P<.001). CONCLUSIONS: Classifiers using features obtained from structural temporal pattern detection significantly improved the prediction of AKI onset in ICU patients over two baselines based on common previous approaches. The proposed method is a generalizable approach to predict AEs in critical care that may be used to help clinicians intervene in a timely manner to prevent or mitigate AEs.

14.
J Biomed Semantics ; 10(1): 6, 2019 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-30975223

RESUMEN

BACKGROUND: Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone, a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information - in this case, housing situation, living alone, and social support - from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. RESULTS: An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest (housing situation, living alone, and social support). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 (homeless/marginally housed) to 0.98 (lives at home/not homeless), accuracy scores ranging from 0.63 (lives in facility) to 0.95 (lives alone), and sensitivity (i.e. recall) scores ranging from 0.75 (lives in facility) to 0.97 (lives alone). CONCLUSIONS: The Moonstone system is - to the best of our knowledge - the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good (lives in facility) to excellent (lives alone) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.


Asunto(s)
Procesamiento de Lenguaje Natural , Medio Social , Salud , Humanos , Factores de Riesgo
15.
AMIA Annu Symp Proc ; 2017: 1312-1321, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29854200

RESUMEN

An important informatics tool for controlling healthcare costs is accurately predicting the likely future healthcare costs of individuals. To address this important need, we conducted a systematic literature review and identified five methods for predicting healthcare costs. To enable a direct comparison of these different approaches, we empirically evaluated the predictive performance of each reported approach, as well as other state-of-the-art supervised learning methods, using data from University of Utah Health Plans for October 2013 through October 2016. The data set consisted of approximately 90,000 individuals, 6.3 million medical claims and 1.2 million pharmacy claims. In this comparative analysis, gradient boosting had the best predictive performance overall and for low to medium cost individuals. For high cost individuals, Artificial Neural Network (ANN) and the Ridge regression model, which have not been previously reported for use in healthcare cost prediction, had the highest performance.


Asunto(s)
Predicción , Costos de la Atención en Salud , Aprendizaje Automático Supervisado , Costos y Análisis de Costo/métodos , Costos de la Atención en Salud/tendencias , Modelos Econométricos , Redes Neurales de la Computación
16.
AMIA Annu Symp Proc ; 2015: 1252-9, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26958265

RESUMEN

Accurate temporal identification and normalization is imperative for many biomedical and clinical tasks such as generating timelines and identifying phenotypes. A major natural language processing challenge is developing and evaluating a generalizable temporal modeling approach that performs well across corpora and institutions. Our long-term goal is to create such a model. We initiate our work on reaching this goal by focusing on temporal expression (TIMEX3) identification. We present a systematic approach to 1) generalize existing solutions for automated TIMEX3 span detection, and 2) assess similarities and differences by various instantiations of TIMEX3 models applied on separate clinical corpora. When evaluated on the 2012 i2b2 and the 2015 Clinical TempEval challenge corpora, our conclusion is that our approach is successful - we achieve competitive results for automated classification, and we identify similarities and differences in TIMEX3 modeling that will be informative in the development of a simplified, general temporal model.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Tiempo , Humanos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda