Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
medRxiv ; 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38883765

ABSTRACT

Background: Atrial fibrillation (AF) is often asymptomatic and thus under-observed. Given the high risks of stroke and heart failure among patients with AF, early prediction and effective management are crucial. Importantly, obstructive sleep apnea is highly prevalent among AF patients (60-90%); therefore, electrocardiogram (ECG) analysis from polysomnography (PSG), a standard diagnostic tool for subjects with suspected sleep apnea, presents a unique opportunity for the early prediction of AF. Our goal is to identify individuals at a high risk of developing AF in the future from a single-lead ECG recorded during standard PSGs. Methods: We analyzed 18,782 single-lead ECG recordings from 13,609 subjects at Massachusetts General Hospital, identifying AF presence using ICD-9/10 codes in medical records. Our dataset comprises 15,913 recordings without a medical record for AF and 2,056 recordings from patients who were first diagnosed with AF between 1 day to 15 years after the PSG recording. The PSG data were partitioned into training, validation, and test cohorts. In the first phase, a signal quality index (SQI) was calculated in 30-second windows and those with SQI < 0.95 were removed. From each remaining window, 150 hand-crafted features were extracted from time, frequency, time-frequency domains, and phase-space reconstructions of the ECG. A compilation of 12 statistical features summarized these window-specific features per recording, resulting in 1,800 features. We then updated a pre-trained deep neural network and data from the PhysioNet Challenge 2021 using transfer-learning to discriminate between recordings with and without AF using the same Challenge data. The model was applied to the PSG ECGs in 16-second windows to generate the probability of AF for each window. From the resultant probability sequence, 13 statistical features were extracted. Subsequently, we trained a shallow neural network to predict future AF using the extracted ECG and probability features. Results: On the test set, our model demonstrated a sensitivity of 0.67, specificity of 0.81, and precision of 0.3 for predicting AF. Further, survival analysis for AF outcomes, using the log-rank test, revealed a hazard ratio of 8.36 (p-value of 1.93 × 10 -52 ). Conclusions: Our proposed ECG analysis method, utilizing overnight PSG data, shows promise in AF prediction despite a modest precision indicating the presence of false positive cases. This approach could potentially enable low-cost screening and proactive treatment for high-risk patients. Ongoing refinement, such as integrating additional physiological parameters could significantly reduce false positives, enhancing its clinical utility and accuracy.

2.
medRxiv ; 2024 May 06.
Article in English | MEDLINE | ID: mdl-38766049

ABSTRACT

Individuals with Autism Spectrum Disorder may display interfering behaviors that limit their inclusion in educational and community settings, negatively impacting their quality of life. These behaviors may also signal potential medical conditions or indicate upcoming high-risk behaviors. This study explores behavior patterns that precede high-risk, challenging behaviors or seizures the following day. We analyzed an existing dataset of behavior and seizure data from 331 children with profound ASD over nine years. We developed a deep learning-based algorithm designed to predict the likelihood of aggression, elopement, and self-injurious behavior (SIB) as three high-risk behavioral events, as well as seizure episodes as a high-risk medical event occurring the next day. The proposed model attained accuracies of 78.4%, 80.68%, 85.43%, and 69.95% for predicting the next-day occurrence of aggression, SIB, elopement, and seizure episodes, respectively. The results were proven significant for more than 95% of the population for all high-risk event predictions using permutation-based statistical tests. Our findings emphasize the potential of leveraging historical behavior data for the early detection of high-risk behavioral and medical events, paving the way for behavioral interventions and improved support in both social and educational environments.

3.
Sensors (Basel) ; 24(6)2024 Mar 07.
Article in English | MEDLINE | ID: mdl-38543993

ABSTRACT

Regular blood pressure (BP) monitoring in clinical and ambulatory settings plays a crucial role in the prevention, diagnosis, treatment, and management of cardiovascular diseases. Recently, the widespread adoption of ambulatory BP measurement devices has been predominantly driven by the increased prevalence of hypertension and its associated risks and clinical conditions. Recent guidelines advocate for regular BP monitoring as part of regular clinical visits or even at home. This increased utilization of BP measurement technologies has raised significant concerns regarding the accuracy of reported BP values across settings. In this survey, which focuses mainly on cuff-based BP monitoring technologies, we highlight how BP measurements can demonstrate substantial biases and variances due to factors such as measurement and device errors, demographics, and body habitus. With these inherent biases, the development of a new generation of cuff-based BP devices that use artificial intelligence (AI) has significant potential. We present future avenues where AI-assisted technologies can leverage the extensive clinical literature on BP-related studies together with the large collections of BP records available in electronic health records. These resources can be combined with machine learning approaches, including deep learning and Bayesian inference, to remove BP measurement biases and provide individualized BP-related cardiovascular risk indexes.


Subject(s)
Artificial Intelligence , Hypertension , Humans , Blood Pressure/physiology , Bayes Theorem , Blood Pressure Determination , Hypertension/diagnosis
4.
medRxiv ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38343835

ABSTRACT

Poor sleep quality in Autism Spectrum Disorder (ASD) individuals is linked to severe daytime behaviors. This study explores the relationship between a prior night's sleep structure and its predictive power for next-day behavior in ASD individuals. The motion was extracted using a low-cost near-infrared camera in a privacy-preserving way. Over two years, we recorded overnight data from 14 individuals, spanning over 2,000 nights, and tracked challenging daytime behaviors, including aggression, self-injury, and disruption. We developed an ensemble machine learning algorithm to predict next-day behavior in the morning and the afternoon. Our findings indicate that sleep quality is a more reliable predictor of morning behavior than afternoon behavior the next day. The proposed model attained an accuracy of 74% and a F1 score of 0.74 in target-sensitive tasks and 67% accuracy and 0.69 F1 score in target-insensitive tasks. For 7 of the 14, better-than-chance balanced accuracy was obtained (p-value<0.05), with 3 showing significant trends (p-value<0.1). These results suggest off-body, privacy-preserving sleep monitoring as a viable method for predicting next-day adverse behavior in ASD individuals, with the potential for behavioral intervention and enhanced care in social and learning settings.

5.
Genetics ; 226(4)2024 04 03.
Article in English | MEDLINE | ID: mdl-38290049

ABSTRACT

Mutations in SETD2 are among the most prevalent drivers of renal cell carcinoma (RCC). We identified a novel single nucleotide polymorphism (SNP) in SETD2, E902Q, within a subset of RCC patients, which manifests as both an inherited or tumor-associated somatic mutation. To determine if the SNP is biologically functional, we used CRISPR-based genome editing to generate the orthologous mutation within the Drosophila melanogaster Set2 gene. In Drosophila, the homologous amino acid substitution, E741Q, reduces H3K36me3 levels comparable to Set2 knockdown, and this loss is rescued by reintroduction of a wild-type Set2 transgene. We similarly uncovered significant defects in spindle morphogenesis, consistent with the established role of SETD2 in methylating α-Tubulin during mitosis to regulate microtubule dynamics and maintain genome stability. These data indicate the Set2 E741Q SNP affects both histone methylation and spindle integrity. Moreover, this work further suggests the SETD2 E902Q SNP may hold clinical relevance.


Subject(s)
Carcinoma, Renal Cell , Drosophila Proteins , Kidney Neoplasms , Animals , Humans , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/metabolism , Carcinoma, Renal Cell/pathology , Histones/genetics , Histones/metabolism , Drosophila/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Polymorphism, Single Nucleotide , Kidney Neoplasms/genetics , Kidney Neoplasms/metabolism , Kidney Neoplasms/pathology , Spindle Apparatus/genetics , Spindle Apparatus/metabolism , Histone-Lysine N-Methyltransferase/genetics , Histone-Lysine N-Methyltransferase/metabolism , Drosophila Proteins/genetics , Drosophila Proteins/metabolism
6.
Crit Care Med ; 51(12): 1802-1811, 2023 12 01.
Article in English | MEDLINE | ID: mdl-37855659

ABSTRACT

OBJECTIVES: To develop the International Cardiac Arrest Research (I-CARE), a harmonized multicenter clinical and electroencephalography database for acute hypoxic-ischemic brain injury research involving patients with cardiac arrest. DESIGN: Multicenter cohort, partly prospective and partly retrospective. SETTING: Seven academic or teaching hospitals from the United States and Europe. PATIENTS: Individuals 16 years old or older who were comatose after return of spontaneous circulation following a cardiac arrest who had continuous electroencephalography monitoring were included. INTERVENTIONS: Not applicable. MEASUREMENTS AND MAIN RESULTS: Clinical and electroencephalography data were harmonized and stored in a common Waveform Database-compatible format. Automated spike frequency, background continuity, and artifact detection on electroencephalography were calculated with 10-second resolution and summarized hourly. Neurologic outcome was determined at 3-6 months using the best Cerebral Performance Category (CPC) scale. This database includes clinical data and 56,676 hours (3.9 terabytes) of continuous electroencephalography data for 1,020 patients. Most patients died ( n = 603, 59%), 48 (5%) had severe neurologic disability (CPC 3 or 4), and 369 (36%) had good functional recovery (CPC 1-2). There is significant variability in mean electroencephalography recording duration depending on the neurologic outcome (range, 53-102 hr for CPC 1 and CPC 4, respectively). Epileptiform activity averaging 1 Hz or more in frequency for at least 1 hour was seen in 258 patients (25%) (19% for CPC 1-2 and 29% for CPC 3-5). Burst suppression was observed for at least 1 hour in 207 (56%) and 635 (97%) patients with CPC 1-2 and CPC 3-5, respectively. CONCLUSIONS: The I-CARE consortium electroencephalography database provides a comprehensive real-world clinical and electroencephalography dataset for neurophysiology research of comatose patients after cardiac arrest. This dataset covers the spectrum of abnormal electroencephalography patterns after cardiac arrest, including epileptiform patterns and those in the ictal-interictal continuum.


Subject(s)
Coma , Heart Arrest , Humans , Adolescent , Coma/diagnosis , Retrospective Studies , Prospective Studies , Heart Arrest/diagnosis , Electroencephalography
7.
PLOS Digit Health ; 2(9): e0000324, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37695769

ABSTRACT

Cardiac auscultation is an accessible diagnostic screening tool that can help to identify patients with heart murmurs, who may need follow-up diagnostic screening and treatment for abnormal cardiac function. However, experts are needed to interpret the heart sounds, limiting the accessibility of cardiac auscultation in resource-constrained environments. Therefore, the George B. Moody PhysioNet Challenge 2022 invited teams to develop algorithmic approaches for detecting heart murmurs and abnormal cardiac function from phonocardiogram (PCG) recordings of heart sounds. For the Challenge, we sourced 5272 PCG recordings from 1452 primarily pediatric patients in rural Brazil, and we invited teams to implement diagnostic screening algorithms for detecting heart murmurs and abnormal cardiac function from the recordings. We required the participants to submit the complete training and inference code for their algorithms, improving the transparency, reproducibility, and utility of their work. We also devised an evaluation metric that considered the costs of screening, diagnosis, misdiagnosis, and treatment, allowing us to investigate the benefits of algorithmic diagnostic screening and facilitate the development of more clinically relevant algorithms. We received 779 algorithms from 87 teams during the Challenge, resulting in 53 working codebases for detecting heart murmurs and abnormal cardiac function from PCG recordings. These algorithms represent a diversity of approaches from both academia and industry, including methods that use more traditional machine learning techniques with engineered clinical and statistical features as well as methods that rely primarily on deep learning models to discover informative features. The use of heart sound recordings for identifying heart murmurs and abnormal cardiac function allowed us to explore the potential of algorithmic approaches for providing more accessible diagnostic screening in resource-constrained environments. The submission of working, open-source algorithms and the use of novel evaluation metrics supported the reproducibility, generalizability, and clinical relevance of the research from the Challenge.

8.
medRxiv ; 2023 Aug 28.
Article in English | MEDLINE | ID: mdl-37693458

ABSTRACT

Objective: To develop a harmonized multicenter clinical and electroencephalography (EEG) database for acute hypoxic-ischemic brain injury research involving patients with cardiac arrest. Design: Multicenter cohort, partly prospective and partly retrospective. Setting: Seven academic or teaching hospitals from the U.S. and Europe. Patients: Individuals aged 16 or older who were comatose after return of spontaneous circulation following a cardiac arrest who had continuous EEG monitoring were included. Interventions: not applicable. Measurements and Main Results: Clinical and EEG data were harmonized and stored in a common Waveform Database (WFDB)-compatible format. Automated spike frequency, background continuity, and artifact detection on EEG were calculated with 10 second resolution and summarized hourly. Neurological outcome was determined at 3-6 months using the best Cerebral Performance Category (CPC) scale. This database includes clinical and 56,676 hours (3.9 TB) of continuous EEG data for 1,020 patients. Most patients died (N=603, 59%), 48 (5%) had severe neurological disability (CPC 3 or 4), and 369 (36%) had good functional recovery (CPC 1-2). There is significant variability in mean EEG recording duration depending on the neurological outcome (range 53-102h for CPC 1 and CPC 4, respectively). Epileptiform activity averaging 1 Hz or more in frequency for at least one hour was seen in 258 (25%) patients (19% for CPC 1-2 and 29% for CPC 3-5). Burst suppression was observed for at least one hour in 207 (56%) and 635 (97%) patients with CPC 1-2 and CPC 3-5, respectively. Conclusions: The International Cardiac Arrest Research (I-CARE) consortium database provides a comprehensive real-world clinical and EEG dataset for neurophysiology research of comatose patients after cardiac arrest. This dataset covers the spectrum of abnormal EEG patterns after cardiac arrest, including epileptiform patterns and those in the ictal-interictal continuum.

9.
Biomed Eng Online ; 22(1): 69, 2023 Jul 10.
Article in English | MEDLINE | ID: mdl-37430279

ABSTRACT

BACKGROUND: It has been hypothesized that low access to healthy and nutritious food increases health disparities. Low-accessibility areas, called food deserts, are particularly commonplace in lower-income neighborhoods. The metrics for measuring the food environment's health, called food desert indices, are primarily based on decadal census data, limiting their frequency and geographical resolution to that of the census. We aimed to create a food desert index with finer geographic resolution than census data and better responsiveness to environmental changes. MATERIALS AND METHODS: We augmented decadal census data with real-time data from platforms such as Yelp and Google Maps and crowd-sourced answers to questionnaires by the Amazon Mechanical Turks to create a real-time, context-aware, and geographically refined food desert index. Finally, we used this refined index in a concept application that suggests alternative routes with similar ETAs between a source and destination in the Atlanta metropolitan area as an intervention to expose a traveler to better food environments. RESULTS: We made 139,000 pull requests to Yelp, analyzing 15,000 unique food retailers in the metro Atlanta area. In addition, we performed 248,000 walking and driving route analyses on these retailers using Google Maps' API. As a result, we discovered that the metro Atlanta food environment creates a strong bias towards eating out rather than preparing a meal at home when access to vehicles is limited. Contrary to the food desert index that we started with, which changed values only at neighborhood boundaries, the food desert index that we built on top of it captured the changing exposure of a subject as they walked or drove through the city. This model was also sensitive to the changes in the environment that occurred after the census data was collected. CONCLUSIONS: Research on the environmental components of health disparities is flourishing. New machine learning models have the potential to augment various information sources and create fine-tuned models of the environment. This opens the way to better understanding the environment and its effects on health and suggesting better interventions.


Subject(s)
Censuses , Crowdsourcing , Humans , Food Deserts , Information Sources , Machine Learning
10.
IEEE J Biomed Health Inform ; 27(8): 3856-3866, 2023 08.
Article in English | MEDLINE | ID: mdl-37163396

ABSTRACT

OBJECTIVE: Murmurs are abnormal heart sounds, identified by experts through cardiac auscultation. The murmur grade, a quantitative measure of the murmur intensity, is strongly correlated with the patient's clinical condition. This work aims to estimate each patient's murmur grade (i.e., absent, soft, loud) from multiple auscultation location phonocardiograms (PCGs) of a large population of pediatric patients from a low-resource rural area. METHODS: The Mel spectrogram representation of each PCG recording is given to an ensemble of 15 convolutional residual neural networks with channel-wise attention mechanisms to classify each PCG recording. The final murmur grade for each patient is derived based on the proposed decision rule and considering all estimated labels for available recordings. The proposed method is cross-validated on a dataset consisting of 3456 PCG recordings from 1007 patients using a stratified ten-fold cross-validation. Additionally, the method was tested on a hidden test set comprised of 1538 PCG recordings from 442 patients. RESULTS: The overall cross-validation performances for patient-level murmur gradings are 86.3% and 81.6% in terms of the unweighted average of sensitivities and F1-scores, respectively. The sensitivities (and F1-scores) for absent, soft, and loud murmurs are 90.7% (93.6%), 75.8% (66.8%), and 92.3% (84.2%), respectively. On the test set, the algorithm achieves an unweighted average of sensitivities of 80.4% and an F1-score of 75.8%. CONCLUSIONS: This study provides a potential approach for algorithmic pre-screening in low-resource settings with relatively high expert screening costs. SIGNIFICANCE: The proposed method represents a significant step beyond detection of murmurs, providing characterization of intensity, which may provide an enhanced classification of clinical outcomes.


Subject(s)
Heart Murmurs , Heart Sounds , Humans , Child , Phonocardiography/methods , Heart Murmurs/diagnosis , Heart Auscultation/methods , Algorithms , Auscultation
12.
Physiol Meas ; 43(8)2022 08 26.
Article in English | MEDLINE | ID: mdl-35815673

ABSTRACT

Objective.The standard twelve-lead electrocardiogram (ECG) is a widely used tool for monitoring cardiac function and diagnosing cardiac disorders. The development of smaller, lower-cost, and easier-to-use ECG devices may improve access to cardiac care in lower-resource environments, but the diagnostic potential of these devices is unclear. This work explores these issues through a public competition: the 2021 PhysioNet Challenge. In addition, we explore the potential for performance boosting through a meta-learning approach.Approach.We sourced 131,149 twelve-lead ECG recordings from ten international sources. We posted 88,253 annotated recordings as public training data and withheld the remaining recordings as hidden validation and test data. We challenged teams to submit containerized, open-source algorithms for diagnosing cardiac abnormalities using various ECG lead combinations, including the code for training their algorithms. We designed and scored the algorithms using an evaluation metric that captures the risks of different misdiagnoses for 30 conditions. After the Challenge, we implemented a semi-consensus voting model on all working algorithms.Main results.A total of 68 teams submitted 1,056 algorithms during the Challenge, providing a variety of automated approaches from both academia and industry. The performance differences across the different lead combinations were smaller than the performance differences across the different test databases, showing that generalizability posed a larger challenge to the algorithms than the choice of ECG leads. A voting model improved performance by 3.5%.Significance.The use of different ECG lead combinations allowed us to assess the diagnostic potential of reduced-lead ECG recordings, and the use of different data sources allowed us to assess the generalizability of the algorithms to diverse institutions and populations. The submission of working, open-source code for both training and testing and the use of a novel evaluation metric improved the reproducibility, generalizability, and applicability of the research conducted during the Challenge.


Subject(s)
Electrocardiography , Signal Processing, Computer-Assisted , Algorithms , Databases, Factual , Electrocardiography/methods , Reproducibility of Results
13.
J Electrocardiol ; 74: 5-9, 2022.
Article in English | MEDLINE | ID: mdl-35878534

ABSTRACT

Despite the recent explosion of machine learning applied to medical data, very few studies have examined algorithmic bias in any meaningful manner, comparing across algorithms, databases, and assessment metrics. In this study, we compared the biases in sex, age, and race of 56 algorithms on over 130,000 electrocardiograms (ECGs) using several metrics and propose a machine learning model design to reduce bias. Participants of the 2021 PhysioNet Challenge designed and implemented working, open-source algorithms to identify clinical diagnosis from 2- lead ECG recordings. We grouped the data from the training, validation, and test datasets by sex (male vs female), age (binned by decade), and race (Asian, Black, White, and Other) whenever possible. We computed recording-wise accuracy, area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), F-measure, and the Challenge Score for each of the 56 algorithms. The Mann-Whitney U and the Kruskal-Wallis tests assessed the performance differences of algorithms across these demographic groups. Group trends revealed similar values for the AUROC, AUPRC, and F-measure for both male and female groups across the training, validation, and test sets. However, recording-wise accuracies were 20% higher (p < 0.01) and the Challenge Score 12% lower (p = 0.02) for female subjects on the test set. AUPRC, F-measure, and the Challenge Score increased with age, while recording-wise accuracy and AUROC decreased with age. The results were similar for the training and test sets, but only recording-wise accuracy (12% decrease per decade, p < 0.01), Challenge Score (1% increase per decade, p < 0.01), and AUROC (1% decrease per decade, p < 0.01) were statistically different on the test set. We observed similar AUROC, AUPRC, Challenge Score, and F-measure values across the different race categories. But, recording-wise accuracies were significantly lower for Black subjects and higher for Asian subjects on the training (31% difference, p < 0.01) and test (39% difference, p < 0.01) sets. A top performing model was then retrained using an additional constraint which simultaneously minimized differences in performance across sex, race and age. This resulted in a modest reduction in performance, with a significant reduction in bias. This work provides a demonstration that biases manifest as a function of model architecture, population, cost function and optimization metric, all of which should be closely examined in any model.


Subject(s)
Arrhythmias, Cardiac , Electrocardiography , Female , Humans , Male , Sex Factors , Age Factors
15.
Cell ; 185(11): 1974-1985.e12, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35512704

ABSTRACT

Comprehensive sequencing of patient tumors reveals genomic mutations across tumor types that enable tumorigenesis and progression. A subset of oncogenic driver mutations results in neomorphic activity where the mutant protein mediates functions not engaged by the parental molecule. Here, we identify prevalent variant-enabled neomorph-protein-protein interactions (neoPPI) with a quantitative high-throughput differential screening (qHT-dS) platform. The coupling of highly sensitive BRET biosensors with miniaturized coexpression in an ultra-HTS format allows large-scale monitoring of the interactions of wild-type and mutant variant counterparts with a library of cancer-associated proteins in live cells. The screening of 17,792 interactions with 2,172,864 data points revealed a landscape of gain of interactions encompassing both oncogenic and tumor suppressor mutations. For example, the recurrent BRAF V600E lesion mediates KEAP1 neoPPI, rewiring a BRAFV600E/KEAP1 signaling axis and creating collateral vulnerability to NQO1 substrates, offering a combination therapeutic strategy. Thus, cancer genomic alterations can create neo-interactions, informing variant-directed therapeutic approaches for precision medicine.


Subject(s)
Neoplasms , Proto-Oncogene Proteins B-raf , Carcinogenesis , Humans , Kelch-Like ECH-Associated Protein 1/genetics , Kelch-Like ECH-Associated Protein 1/metabolism , Mutation , NF-E2-Related Factor 2/metabolism , Neoplasms/genetics , Proto-Oncogene Proteins B-raf/genetics , Proto-Oncogene Proteins B-raf/metabolism
16.
Crit Care Explor ; 3(5): e0402, 2021 May.
Article in English | MEDLINE | ID: mdl-34079945

ABSTRACT

BACKGROUND: Acute respiratory failure occurs frequently in hospitalized patients and often begins outside the ICU, associated with increased length of stay, cost, and mortality. Delays in decompensation recognition are associated with worse outcomes. OBJECTIVES: The objective of this study is to predict acute respiratory failure requiring any advanced respiratory support (including noninvasive ventilation). With the advent of the coronavirus disease pandemic, concern regarding acute respiratory failure has increased. DERIVATION COHORT: All admission encounters from January 2014 to June 2017 from three hospitals in the Emory Healthcare network (82,699). VALIDATION COHORT: External validation cohort: all admission encounters from January 2014 to June 2017 from a fourth hospital in the Emory Healthcare network (40,143). Temporal validation cohort: all admission encounters from February to April 2020 from four hospitals in the Emory Healthcare network coronavirus disease tested (2,564) and coronavirus disease positive (389). PREDICTION MODEL: All admission encounters had vital signs, laboratory, and demographic data extracted. Exclusion criteria included invasive mechanical ventilation started within the operating room or advanced respiratory support within the first 8 hours of admission. Encounters were discretized into hour intervals from 8 hours after admission to discharge or advanced respiratory support initiation and binary labeled for advanced respiratory support. Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment, our eXtreme Gradient Boosting-based algorithm, was compared against Modified Early Warning Score. RESULTS: Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment had significantly better discrimination than Modified Early Warning Score (area under the receiver operating characteristic curve 0.85 vs 0.57 [test], 0.84 vs 0.61 [external validation]). Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment maintained a positive predictive value (0.31-0.21) similar to that of Modified Early Warning Score greater than 4 (0.29-0.25) while identifying 6.62 (validation) to 9.58 (test) times more true positives. Furthermore, Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment performed more effectively in temporal validation (area under the receiver operating characteristic curve 0.86 [coronavirus disease tested], 0.93 [coronavirus disease positive]), while achieving identifying 4.25-4.51× more true positives. CONCLUSIONS: Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment is more effective than Modified Early Warning Score in predicting respiratory failure requiring advanced respiratory support at external validation and in coronavirus disease 2019 patients. Silent prospective validation necessary before local deployment.

17.
J Comput Biol ; 28(5): 469-484, 2021 05.
Article in English | MEDLINE | ID: mdl-33400606

ABSTRACT

A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions that we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased, explaining the large subnetworks output by jActiveModules. Based on these insights, we introduce NetMix, an algorithm that uses Gaussian mixture models to obtain less biased estimates of the parameters of the ASD. We demonstrate that NetMix outperforms existing methods in identifying altered subnetworks on both simulated and real data, including the identification of differentially expressed genes from both microarray and RNA-seq experiments and the identification of cancer driver genes in somatic mutation data.


Subject(s)
Computational Biology/methods , Algorithms , Bias , Likelihood Functions , Models, Statistical
18.
Physiol Meas ; 41(12): 124003, 2021 01 01.
Article in English | MEDLINE | ID: mdl-33176294

ABSTRACT

OBJECTIVE: Vast 12-lead ECGs repositories provide opportunities to develop new machine learning approaches for creating accurate and automatic diagnostic systems for cardiac abnormalities. However, most 12-lead ECG classification studies are trained, tested, or developed in single, small, or relatively homogeneous datasets. In addition, most algorithms focus on identifying small numbers of cardiac arrhythmias that do not represent the complexity and difficulty of ECG interpretation. This work addresses these issues by providing a standard, multi-institutional database and a novel scoring metric through a public competition: the PhysioNet/Computing in Cardiology Challenge 2020. APPROACH: A total of 66 361 12-lead ECG recordings were sourced from six hospital systems from four countries across three continents; 43 101 recordings were posted publicly with a focus on 27 diagnoses. For the first time in a public competition, we required teams to publish open-source code for both training and testing their algorithms, ensuring full scientific reproducibility. MAIN RESULTS: A total of 217 teams submitted 1395 algorithms during the Challenge, representing a diversity of approaches for identifying cardiac abnormalities from both academia and industry. As with previous Challenges, high-performing algorithms exhibited significant drops ([Formula: see text]10%) in performance on the hidden test data. SIGNIFICANCE: Data from diverse institutions allowed us to assess algorithmic generalizability. A novel evaluation metric considered different misclassification errors for different cardiac abnormalities, capturing the outcomes and risks of different diagnoses. Requiring both trained models and code for training models improved the generalizability of submissions, setting a new bar in reproducibility for public data science competitions.


Subject(s)
Cardiology , Electrocardiography , Algorithms , Arrhythmias, Cardiac/diagnosis , Databases, Factual , Electrocardiography/classification , Female , Humans , Male , Middle Aged , Reproducibility of Results
19.
Cell Rep ; 30(9): 2900-2908.e4, 2020 03 03.
Article in English | MEDLINE | ID: mdl-32130895

ABSTRACT

The immune composition of the tumor microenvironment influences response and resistance to immunotherapies. While numerous studies have identified somatic correlates of immune infiltration, germline features that associate with immune infiltrates in cancers remain incompletely characterized. We analyze seven million autosomal germline variants in the TCGA cohort and test for association with established immune-related phenotypes that describe the tumor immune microenvironment. We identify one SNP associated with the amount of infiltrating follicular helper T cells; 23 candidate genes, some of which are involved in cytokine-mediated signaling and others containing cancer-risk SNPs; and networks with genes that are part of the DNA repair and transcription elongation pathways. In addition, we find a positive association between polygenic risk for rheumatoid arthritis and amount of infiltrating CD8+ T cells. Overall, we identify multiple germline genetic features associated with tumor-immune phenotypes and develop a framework for probing inherited features that contribute to differences in immune infiltration.


Subject(s)
Germ Cells/metabolism , Lymphocytes, Tumor-Infiltrating/immunology , Neoplasms/genetics , Neoplasms/immunology , Autoimmune Diseases/immunology , DNA Repair/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Leukocytes/metabolism , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide/genetics , Risk Factors , T-Lymphocytes, Helper-Inducer/immunology , Transcription, Genetic
20.
Nat Commun ; 11(1): 729, 2020 02 05.
Article in English | MEDLINE | ID: mdl-32024854

ABSTRACT

The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted pathway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.


Subject(s)
Gene Expression Regulation, Neoplastic , Mutation , Neoplasms/genetics , RNA Splicing , Chromatin Assembly and Disassembly , Computational Biology/methods , Databases, Genetic , Genome, Human , Humans , Metabolic Networks and Pathways/genetics , Neoplasms/metabolism , Promoter Regions, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL