Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 147
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Cardiothorac Vasc Anesth ; 38(2): 526-533, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37838509

RESUMEN

OBJECTIVE: Postoperative delirium (POD) can occur in up to 50% of older patients undergoing cardiovascular surgery, resulting in hospitalization and significant morbidity and mortality. This study aimed to determine whether intraoperative neurophysiologic monitoring (IONM) modalities can be used to predict delirium in patients undergoing cardiovascular surgery. DESIGN: Adult patients undergoing cardiovascular surgery with IONM between 2019 and 2021 were reviewed retrospectively. Delirium was assessed multiple times using the Intensive Care Delirium Screening Checklist (ICDSC). Patients with an ICDSC score ≥4 were considered to have POD. Significant IONM changes were evaluated based on a visual review of electroencephalography (EEG) and somatosensory evoked potentials data and documentation of significant changes during surgery. SETTING: University of Pittsburgh Medical Center hospitals. PARTICIPANTS: Patients 18 years old and older undergoing cardiovascular surgery with IONM monitoring. MEASUREMENTS AND MAIN RESULTS: Of the 578 patients undergoing cardiovascular surgery with IONM, 126 had POD (21.8%). Significant IONM changes were noted in 134 patients, of whom 49 patients had delirium (36.6%). In contrast, 444 patients had no IONM changes during surgery, of whom 77 (17.3%) patients had POD. Upon multivariate analysis, IONM changes were associated with POD (odds ratio 2.12; 95% CI 1.31-3.44; p < 0.001). Additionally, baseline EEG abnormalities were associated with POD (p = 0.002). CONCLUSION: Significant IONM changes are associated with an increased risk of POD in patients undergoing cardiovascular surgery. These findings offer a basis for future research and analysis of EEG and somatosensory evoked potential monitoring to predict, detect, and prevent POD.


Asunto(s)
Delirio del Despertar , Monitorización Neurofisiológica Intraoperatoria , Adulto , Humanos , Adolescente , Estudios Retrospectivos , Potenciales Evocados Somatosensoriales/fisiología , Monitorización Neurofisiológica Intraoperatoria/métodos , Electroencefalografía , Complicaciones Posoperatorias/diagnóstico , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/prevención & control
2.
J Biomed Inform ; 139: 104306, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36738870

RESUMEN

BACKGROUND: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as ​​reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS: We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS: With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION: In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Humanos , Recolección de Datos , Registros , Análisis por Conglomerados
3.
J Med Internet Res ; 24(5): e37931, 2022 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-35476727

RESUMEN

BACKGROUND: Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE: The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS: From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS: EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS: A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiología , Registros Electrónicos de Salud , Hospitalización , Humanos , Estudios Retrospectivos
4.
Brief Bioinform ; 20(3): 842-856, 2019 05 21.
Artículo en Inglés | MEDLINE | ID: mdl-29186302

RESUMEN

Mental illness is increasingly recognized as both a significant cost to society and a significant area of opportunity for biological breakthrough. As -omics and imaging technologies enable researchers to probe molecular and physiological underpinnings of multiple diseases, opportunities arise to explore the biological basis for behavioral health and disease. From individual investigators to large international consortia, researchers have generated rich data sets in the area of mental health, including genomic, transcriptomic, metabolomic, proteomic, clinical and imaging resources. General data repositories such as the Gene Expression Omnibus (GEO) and Database of Genotypes and Phenotypes (dbGaP) and mental health (MH)-specific initiatives, such as the Psychiatric Genomics Consortium, MH Research Network and PsychENCODE represent a wealth of information yet to be gleaned. At the same time, novel approaches to integrate and analyze data sets are enabling important discoveries in the area of mental and behavioral health. This review will discuss and catalog into an organizing framework the increasingly diverse set of MH data resources available, using schizophrenia as a focus area, and will describe novel and integrative approaches to molecular biomarker discovery that make use of mental health data.


Asunto(s)
Biología Computacional , Salud Mental , Investigación Biomédica Traslacional , Biomarcadores/metabolismo , Humanos
5.
BMC Med Inform Decis Mak ; 21(1): 158, 2021 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001100

RESUMEN

BACKGROUND: Malaria is a major cause of death in children under five years old in low- and middle-income countries such as Malawi. Accurate diagnosis and management of malaria can help reduce the global burden of childhood morbidity and mortality. Trained healthcare workers in rural health centers manage malaria with limited supplies of malarial diagnostic tests and drugs for treatment. A clinical decision support system that integrates predictive models to provide an accurate prediction of malaria based on clinical features could aid healthcare workers in the judicious use of testing and treatment. We developed Bayesian network (BN) models to predict the probability of malaria from clinical features and an illustrative decision tree to model the decision to use or not use a malaria rapid diagnostic test (mRDT). METHODS: We developed two BN models to predict malaria from a dataset of outpatient encounters of children in Malawi. The first BN model was created manually with expert knowledge, and the second model was derived using an automated method. The performance of the BN models was compared to other statistical models on a range of performance metrics at multiple thresholds. We developed a decision tree that integrates predictions with the costs of mRDT and a course of recommended treatment. RESULTS: The manually created BN model achieved an area under the ROC curve (AUC) equal to 0.60 which was statistically significantly higher than the other models. At the optimal threshold for classification, the manual BN model had sensitivity and specificity of 0.74 and 0.42 respectively, and the automated BN model had sensitivity and specificity of 0.45 and 0.68 respectively. The balanced accuracy values were similar across all the models. Sensitivity analysis of the decision tree showed that for values of probability of malaria below 0.04 and above 0.40, the preferred decision that minimizes expected costs is not to perform mRDT. CONCLUSION: In resource-constrained settings, judicious use of mRDT is important. Predictive models in combination with decision analysis can provide personalized guidance on when to use mRDT in the management of childhood malaria. BN models can be efficiently derived from data to support clinical decision making.


Asunto(s)
Malaria , Teorema de Bayes , Niño , Preescolar , Árboles de Decisión , Pruebas Diagnósticas de Rutina , Humanos , Malaria/diagnóstico , Malaria/tratamiento farmacológico , Malaui/epidemiología
6.
J Chem Inf Model ; 60(12): 5647-5657, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-33140969

RESUMEN

Learning accurate drug representations is essential for tasks such as computational drug repositioning and prediction of drug side effects. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised) and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and recapitulate the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that our drug embedding can predict new uses and discover new side effects of existing drugs. We show that it significantly outperforms comparison methods in both tasks.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Preparaciones Farmacéuticas , Algoritmos , Análisis por Conglomerados , Reposicionamiento de Medicamentos , Humanos
7.
J Med Internet Res ; 22(4): e15876, 2020 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-32238342

RESUMEN

BACKGROUND: Electronic medical record (EMR) systems capture large amounts of data per patient and present that data to physicians with little prioritization. Without prioritization, physicians must mentally identify and collate relevant data, an activity that can lead to cognitive overload. To mitigate cognitive overload, a Learning EMR (LEMR) system prioritizes the display of relevant medical record data. Relevant data are those that are pertinent to a context-defined as the combination of the user, clinical task, and patient case. To determine which data are relevant in a specific context, a LEMR system uses supervised machine learning models of physician information-seeking behavior. Since obtaining information-seeking behavior data via manual annotation is slow and expensive, automatic methods for capturing such data are needed. OBJECTIVE: The goal of the research was to propose and evaluate eye tracking as a high-throughput method to automatically acquire physician information-seeking behavior useful for training models for a LEMR system. METHODS: Critical care medicine physicians reviewed intensive care unit patient cases in an EMR interface developed for the study. Participants manually identified patient data that were relevant in the context of a clinical task: preparing a patient summary to present at morning rounds. We used eye tracking to capture each physician's gaze dwell time on each data item (eg, blood glucose measurements). Manual annotations and gaze dwell times were used to define target variables for developing supervised machine learning models of physician information-seeking behavior. We compared the performance of manual selection and gaze-derived models on an independent set of patient cases. RESULTS: A total of 68 pairs of manual selection and gaze-derived machine learning models were developed from training data and evaluated on an independent evaluation data set. A paired Wilcoxon signed-rank test showed similar performance of manual selection and gaze-derived models on area under the receiver operating characteristic curve (P=.40). CONCLUSIONS: We used eye tracking to automatically capture physician information-seeking behavior and used it to train models for a LEMR system. The models that were trained using eye tracking performed like models that were trained using manual annotations. These results support further development of eye tracking as a high-throughput method for training clinical decision support systems that prioritize the display of relevant medical record data.


Asunto(s)
Registros Electrónicos de Salud/normas , Aprendizaje Automático/normas , Movimientos Oculares , Humanos , Conducta en la Búsqueda de Información
8.
J Med Internet Res ; 22(8): e17478, 2020 08 12.
Artículo en Inglés | MEDLINE | ID: mdl-32784184

RESUMEN

BACKGROUND: Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. OBJECTIVE: This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. METHODS: We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. RESULTS: LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. CONCLUSIONS: We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.


Asunto(s)
Aprendizaje Profundo , Aprendizaje Automático/normas , Vigilancia en Salud Pública/métodos , Medios de Comunicación Sociales/normas , Vapeo/tendencias , Humanos , Estudios Longitudinales
9.
10.
J Biomed Inform ; 100: 103327, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31676461

RESUMEN

BACKGROUND: Electronic medical record (EMR) systems need functionality that decreases cognitive overload by drawing the clinician's attention to the right data, at the right time. We developed a Learning EMR (LEMR) system that learns statistical models of clinician information-seeking behavior and applies those models to direct the display of data in future patients. We evaluated the performance of the system in identifying relevant patient data in intensive care unit (ICU) patient cases. METHODS: To capture information-seeking behavior, we enlisted critical care medicine physicians who reviewed a set of patient cases and selected data items relevant to the task of presenting at morning rounds. Using patient EMR data as predictors, we built machine learning models to predict their relevancy. We prospectively evaluated the predictions of a set of high performing models. RESULTS: On an independent evaluation data set, 25 models achieved precision of 0.52, 95% CI [0.49, 0.54] and recall of 0.77, 95% CI [0.75, 0.80] in identifying relevant patient data items. For data items missed by the system, the reviewers rated the effect of not seeing those data from no impact to minor impact on patient care in about 82% of the cases. CONCLUSION: Data-driven approaches for adaptively displaying data in EMR systems, like the LEMR system, show promise in using information-seeking behavior of clinicians to identify and highlight relevant patient data.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Humanos , Conducta en la Búsqueda de Información , Médicos/psicología
11.
JAMA ; 321(20): 2003-2017, 2019 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-31104070

RESUMEN

Importance: Sepsis is a heterogeneous syndrome. Identification of distinct clinical phenotypes may allow more precise therapy and improve care. Objective: To derive sepsis phenotypes from clinical data, determine their reproducibility and correlation with host-response biomarkers and clinical outcomes, and assess the potential causal relationship with results from randomized clinical trials (RCTs). Design, Settings, and Participants: Retrospective analysis of data sets using statistical, machine learning, and simulation tools. Phenotypes were derived among 20 189 total patients (16 552 unique patients) who met Sepsis-3 criteria within 6 hours of hospital presentation at 12 Pennsylvania hospitals (2010-2012) using consensus k means clustering applied to 29 variables. Reproducibility and correlation with biological parameters and clinical outcomes were assessed in a second database (2013-2014; n = 43 086 total patients and n = 31 160 unique patients), in a prospective cohort study of sepsis due to pneumonia (n = 583), and in 3 sepsis RCTs (n = 4737). Exposures: All clinical and laboratory variables in the electronic health record. Main Outcomes and Measures: Derived phenotype (α, ß, γ, and δ) frequency, host-response biomarkers, 28-day and 365-day mortality, and RCT simulation outputs. Results: The derivation cohort included 20 189 patients with sepsis (mean age, 64 [SD, 17] years; 10 022 [50%] male; mean maximum 24-hour Sequential Organ Failure Assessment [SOFA] score, 3.9 [SD, 2.4]). The validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Of the 4 derived phenotypes, the α phenotype was the most common (n = 6625; 33%) and included patients with the lowest administration of a vasopressor; in the ß phenotype (n = 5512; 27%), patients were older and had more chronic illness and renal dysfunction; in the γ phenotype (n = 5385; 27%), patients had more inflammation and pulmonary dysfunction; and in the δ phenotype (n = 2667; 13%), patients had more liver dysfunction and septic shock. Phenotype distributions were similar in the validation cohort. There were consistent differences in biomarker patterns by phenotype. In the derivation cohort, cumulative 28-day mortality was 287 deaths of 5691 unique patients (5%) for the α phenotype; 561 of 4420 (13%) for the ß phenotype; 1031 of 4318 (24%) for the γ phenotype; and 897 of 2223 (40%) for the δ phenotype. Across all cohorts and trials, 28-day and 365-day mortality were highest among the δ phenotype vs the other 3 phenotypes (P < .001). In simulation models, the proportion of RCTs reporting benefit, harm, or no effect changed considerably (eg, varying the phenotype frequencies within an RCT of early goal-directed therapy changed the results from >33% chance of benefit to >60% chance of harm). Conclusions and Relevance: In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.


Asunto(s)
Sepsis/clasificación , Algoritmos , Biomarcadores/sangre , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Mortalidad Hospitalaria , Humanos , Aprendizaje Automático , Puntuaciones en la Disfunción de Órganos , Fenotipo , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sepsis/mortalidad , Sepsis/terapia
13.
J Perinat Med ; 46(5): 509-521, 2018 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-28665803

RESUMEN

BACKGROUND: Recent studies have shown that epigenetic differences can increase the risk of spontaneous preterm birth (PTB). However, little is known about heterogeneity underlying such epigenetic differences, which could lead to hypotheses for biological pathways in specific patient subgroups, and corresponding targeted interventions critical for precision medicine. Using bipartite network analysis of fetal DNA methylation data we demonstrate a novel method for classification of PTB. METHODS: The data consisted of DNA methylation across the genome (HumanMethylation450 BeadChip) in cord blood from 50 African-American subjects consisting of 22 cases of early spontaneous PTB (24-34 weeks of gestation) and 28 controls (>39 weeks of gestation). These data were analyzed using a combination of (1) a supervised method to select the top 10 significant methylation sites, (2) unsupervised "subject-variable" bipartite networks to visualize and quantitatively analyze how those 10 methylation sites co-occurred across all the subjects, and across only the cases with the goal of analyzing subgroups and their underlying pathways, and (3) a simple linear regression to test whether there was an association between the total methylation in the cases, and gestational age. RESULTS: The bipartite network analysis of all subjects and significant methylation sites revealed statistically significant clustering consisting of an inverse symmetrical relationship in the methylation profiles between a case-enriched subgroup and a control-enriched subgroup: the former was predominantly hypermethylated across seven methylation sites, and hypomethylated across three methylation sites, whereas the latter was predominantly hypomethylated across the above seven methylation sites and hypermethylated across the three methylation sites. Furthermore, the analysis of only cases revealed one subgroup that was predominantly hypomethylated across seven methylation sites, and another subgroup that was hypomethylated across all methylation sites suggesting the presence of heterogeneity in PTB pathophysiology. Finally, the analysis found a strong inverse linear relationship between total methylation and gestational age suggesting that methylation differences could be used as predictive markers for gestational length. CONCLUSIONS: The results demonstrate that unsupervised bipartite networks helped to identify a complex but comprehensible data-driven hypotheses related to patient subgroups and inferences about their underlying pathways, and therefore were an effective complement to supervised approaches currently used.


Asunto(s)
Metilación de ADN , Epigénesis Genética , Heterogeneidad Genética , Nacimiento Prematuro/genética , Interpretación Estadística de Datos , Femenino , Humanos , Embarazo , Estudios Retrospectivos
14.
J Biomed Inform ; 69: 177-187, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28428140

RESUMEN

The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural language processing (NLP) system for automated BI-RADS categories extraction from breast radiology reports. We evaluated an existing rule-based NLP algorithm, and then we developed and evaluated our own method using a supervised machine learning approach. We divided the BI-RADS category extraction task into two specific tasks: (1) annotation of all BI-RADS category values within a report, (2) classification of the laterality of each BI-RADS category value. We used one algorithm for task 1 and evaluated three algorithms for task 2. Across all evaluations and model training, we used a total of 2159 radiology reports from 18 hospitals, from 2003 to 2015. Performance with the existing rule-based algorithm was not satisfactory. Conditional random fields showed a high performance for task 1 with an F-1 measure of 0.95. Rules from partial decision trees (PART) algorithm showed the best performance across classes for task 2 with a weighted F-1 measure of 0.91 for BIRADS 0-6, and 0.93 for BIRADS 3-5. Classification performance by class showed that performance improved for all classes from Naïve Bayes to Support Vector Machine (SVM), and also from SVM to PART. Our system is able to annotate and classify all BI-RADS mentions present in a single radiology report and can serve as the foundation for future studies that will leverage automated BI-RADS annotation, to provide feedback to radiologists as part of a learning health system loop.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Curaduría de Datos , Mamografía , Sistemas de Información Radiológica , Teorema de Bayes , Mama , Femenino , Humanos
15.
BMC Cancer ; 16: 184, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26944944

RESUMEN

BACKGROUND: Adenocarcinoma (ADC) and squamous cell carcinoma (SCC) are the most prevalent histological types among lung cancers. Distinguishing between these subtypes is critically important because they have different implications for prognosis and treatment. Normally, histopathological analyses are used to distinguish between the two, where the tissue samples are collected based on small endoscopic samples or needle aspirations. However, the lack of cell architecture in these small tissue samples hampers the process of distinguishing between the two subtypes. Molecular profiling can also be used to discriminate between the two lung cancer subtypes, on condition that the biopsy is composed of at least 50 % of tumor cells. However, for some cases, the tissue composition of a biopsy might be a mix of tumor and tumor-adjacent histologically normal tissue (TAHN). When this happens, a new biopsy is required, with associated cost, risks and discomfort to the patient. To avoid this problem, we hypothesize that a computational method can distinguish between lung cancer subtypes given tumor and TAHN tissue. METHODS: Using publicly available datasets for gene expression and DNA methylation, we applied four classification tasks, depending on the possible combinations of tumor and TAHN tissue. First, we used a feature selector (ReliefF/Limma) to select relevant variables, which were then used to build a simple naïve Bayes classification model. Then, we evaluated the classification performance of our models by measuring the area under the receiver operating characteristic curve (AUC). Finally, we analyzed the relevance of the selected genes using hierarchical clustering and IPA® software for gene functional analysis. RESULTS: All Bayesian models achieved high classification performance (AUC > 0.94), which were confirmed by hierarchical cluster analysis. From the genes selected, 25 (93 %) were found to be related to cancer (19 were associated with ADC or SCC), confirming the biological relevance of our method. CONCLUSIONS: The results from this study confirm that computational methods using tumor and TAHN tissue can serve as a prognostic tool for lung cancer subtype classification. Our study complements results from other studies where TAHN tissue has been used as prognostic tool for prostate cancer. The clinical implications of this finding could greatly benefit lung cancer patients.


Asunto(s)
Genómica/métodos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Teorema de Bayes , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/genética , Análisis por Conglomerados , Biología Computacional/métodos , Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Pronóstico , Reproducibilidad de los Resultados
16.
J Biomed Inform ; 64: 211-221, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27720983

RESUMEN

Medical errors remain a significant problem in healthcare. This paper investigates a data-driven outlier-based monitoring and alerting framework that uses data in the Electronic Medical Records (EMRs) repositories of past patient cases to identify any unusual clinical actions in the EMR of a current patient. Our conjecture is that these unusual clinical actions correspond to medical errors often enough to justify their detection and alerting. Our approach works by using EMR repositories to learn statistical models that relate patient states to patient-management actions. We evaluated this approach on the EMR data for 24,658 intensive care unit (ICU) patient cases. A total of 16,500 cases were used to train statistical models for ordering medications and laboratory tests given the patient state summarizing the patient's clinical history. The models were applied to a separate test set of 8158 ICU patient cases and used to generate alerts. A subset of 240 alerts generated by the models were evaluated and assessed by eighteen ICU clinicians. The overall true positive rates for the alerts (TPARs) ranged from 0.44 to 0.71. The TPAR for medication order alerts specifically ranged from 0.31 to 0.61 and for laboratory order alerts from 0.44 to 0.75. These results support outlier-based alerting as a promising new approach to data-driven clinical alerting that is generated automatically based on past EMR data.


Asunto(s)
Registros Electrónicos de Salud , Unidades de Cuidados Intensivos , Errores Médicos , Modelos Estadísticos , Cuidados Críticos , Humanos , Valores Críticos de Laboratorio , Estadística como Asunto
17.
Proteomics ; 15(8): 1405-18, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25684269

RESUMEN

Despite years of preclinical development, biological interventions designed to treat complex diseases such as asthma often fail in phase III clinical trials. These failures suggest that current methods to analyze biomedical data might be missing critical aspects of biological complexity such as the assumption that cases and controls come from homogeneous distributions. Here we discuss why and how methods from the rapidly evolving field of visual analytics can help translational teams (consisting of biologists, clinicians, and bioinformaticians) to address the challenge of modeling and inferring heterogeneity in the proteomic and phenotypic profiles of patients with complex diseases. Because a primary goal of visual analytics is to amplify the cognitive capacities of humans for detecting patterns in complex data, we begin with an overview of the cognitive foundations for the field of visual analytics. Next, we organize the primary ways in which a specific form of visual analytics called networks has been used to model and infer biological mechanisms, which help to identify the properties of networks that are particularly useful for the discovery and analysis of proteomic heterogeneity in complex diseases. We describe one such approach called subject-protein networks, and demonstrate its application on two proteomic datasets. This demonstration provides insights to help translational teams overcome theoretical, practical, and pedagogical hurdles for the widespread use of subject-protein networks for analyzing molecular heterogeneities, with the translational goal of designing biomarker-based clinical trials, and accelerating the development of personalized approaches to medicine.


Asunto(s)
Interpretación Estadística de Datos , Fiebre Botonosa/metabolismo , Gráficos por Computador , Redes Reguladoras de Genes , Humanos , Mapas de Interacción de Proteínas , Proteoma/genética , Proteoma/metabolismo , Proteómica/métodos
18.
BMC Bioinformatics ; 16: 226, 2015 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-26202217

RESUMEN

BACKGROUND: Most 'transcriptomic' data from microarrays are generated from small sample sizes compared to the large number of measured biomarkers, making it very difficult to build accurate and generalizable disease state classification models. Integrating information from different, but related, 'transcriptomic' data may help build better classification models. However, most proposed methods for integrative analysis of 'transcriptomic' data cannot incorporate domain knowledge, which can improve model performance. To this end, we have developed a methodology that leverages transfer rule learning and functional modules, which we call TRL-FM, to capture and abstract domain knowledge in the form of classification rules to facilitate integrative modeling of multiple gene expression data. TRL-FM is an extension of the transfer rule learner (TRL) that we developed previously. The goal of this study was to test our hypothesis that "an integrative model obtained via the TRL-FM approach outperforms traditional models based on single gene expression data sources". RESULTS: To evaluate the feasibility of the TRL-FM framework, we compared the area under the ROC curve (AUC) of models developed with TRL-FM and other traditional methods, using 21 microarray datasets generated from three studies on brain cancer, prostate cancer, and lung disease, respectively. The results show that TRL-FM statistically significantly outperforms TRL as well as traditional models based on single source data. In addition, TRL-FM performed better than other integrative models driven by meta-analysis and cross-platform data merging. CONCLUSIONS: The capability of utilizing transferred abstract knowledge derived from source data using feature mapping enables the TRL-FM framework to mimic the human process of learning and adaptation when performing related tasks. The novel TRL-FM methodology for integrative modeling for multiple 'transcriptomic' datasets is able to intelligently incorporate domain knowledge that traditional methods might disregard, to boost predictive power and generalization performance. In this study, TRL-FM's abstraction of knowledge is achieved in the form of functional modules, but the overall framework is generalizable in that different approaches of acquiring abstract knowledge can be integrated into this framework.


Asunto(s)
Algoritmos , Modelos Genéticos , Biomarcadores/metabolismo , Bases de Datos Factuales , Expresión Génica , Humanos , Neoplasias/metabolismo , Neoplasias/patología
19.
J Biomed Inform ; 58: 60-69, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26385375

RESUMEN

Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.


Asunto(s)
Servicio de Urgencia en Hospital , Gripe Humana/diagnóstico , Aprendizaje Automático , Humanos
20.
BMC Genomics ; 15: 282, 2014 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-24731236

RESUMEN

BACKGROUND: Ranking and identifying biomarkers that are associated with disease from genome-wide measurements holds significant promise for understanding the genetic basis of common diseases. The large number of single nucleotide polymorphisms (SNPs) in genome-wide studies (GWAS), however, makes this task computationally challenging when the ranking is to be done in a multivariate fashion. This paper evaluates the performance of a multivariate graph-based method called label propagation (LP) that efficiently ranks SNPs in genome-wide data. RESULTS: The performance of LP was evaluated on a synthetic dataset and two late onset Alzheimer's disease (LOAD) genome-wide datasets, and the performance was compared to that of three control methods. The control methods included chi squared, which is a commonly used univariate method, as well as a Relief method called SWRF and a sparse logistic regression (SLR) method, which are both multivariate ranking methods. Performance was measured by evaluating the top-ranked SNPs in terms of classification performance, reproducibility between the two datasets, and prior evidence of being associated with LOAD.On the synthetic data LP performed comparably to the control methods. On GWAS data, LP performed significantly better than chi squared and SWRF in classification performance in the range from 10 to 1000 top-ranked SNPs for both datasets, and not significantly different from SLR. LP also had greater ranking reproducibility than chi squared, SWRF, and SLR. Among the 25 top-ranked SNPs that were identified by LP, there were 14 SNPs in one dataset that had evidence in the literature of being associated with LOAD, and 10 SNPs in the other, which was higher than for the other methods. CONCLUSION: LP performed considerably better in ranking SNPs in two high-dimensional genome-wide datasets when compared to three control methods. It had better performance in the evaluation measures we used, and is computationally efficient to be applied practically to data from genome-wide studies. These results provide support for including LP in the methods that are used to rank SNPs in genome-wide datasets.


Asunto(s)
Enfermedad de Alzheimer/genética , Biomarcadores/metabolismo , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA