RESUMO
Rationale: Emphysema is a chronic obstructive pulmonary disease phenotype with important prognostic implications. Identifying blood-based biomarkers of emphysema will facilitate early diagnosis and development of targeted therapies. Objectives: To discover blood omics biomarkers for chest computed tomography-quantified emphysema and develop predictive biomarker panels. Methods: Emphysema blood biomarker discovery was performed using differential gene expression, alternative splicing, and protein association analyses in a training sample of 2,370 COPDGene participants with available blood RNA sequencing, plasma proteomics, and clinical data. Internal validation was conducted in a COPDGene testing sample (n = 1,016), and external validation was done in the ECLIPSE study (n = 526). Because low body mass index (BMI) and emphysema often co-occur, we performed a mediation analysis to quantify the effect of BMI on gene and protein associations with emphysema. Elastic net models with bootstrapping were also developed in the training sample sequentially using clinical, blood cell proportions, RNA-sequencing, and proteomic biomarkers to predict quantitative emphysema. Model accuracy was assessed by the area under the receiver operating characteristic curves for subjects stratified into tertiles of emphysema severity. Measurements and Main Results: Totals of 3,829 genes, 942 isoforms, 260 exons, and 714 proteins were significantly associated with emphysema (false discovery rate, 5%) and yielded 11 biological pathways. Seventy-four percent of these genes and 62% of these proteins showed mediation by BMI. Our prediction models demonstrated reasonable predictive performance in both COPDGene and ECLIPSE. The highest-performing model used clinical, blood cell, and protein data (area under the receiver operating characteristic curve in COPDGene testing, 0.90; 95% confidence interval, 0.85-0.90). Conclusions: Blood transcriptome and proteome-wide analyses revealed key biological pathways of emphysema and enhanced the prediction of emphysema.
Assuntos
Enfisema , Doença Pulmonar Obstrutiva Crônica , Enfisema Pulmonar , Humanos , Transcriptoma , Proteômica , Enfisema Pulmonar/genética , Enfisema Pulmonar/complicações , Biomarcadores , Perfilação da Expressão GênicaRESUMO
Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.
Assuntos
Aprendizado Profundo , Modelos Estatísticos , RNA-Seq/métodos , Fumar , Idoso , Biologia Computacional , Éxons/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Isoformas de Proteínas/genética , Curva ROC , Fumar/epidemiologia , Fumar/genéticaRESUMO
Transcranial magnetic stimulation (TMS) is often applied to the motor cortex to stimulate a collection of motor evoked potentials (MEPs) in groups of peripheral muscles. The causal interface between TMS and MEP is the selective activation of neurons in the motor cortex; moving around the TMS 'spot' over the motor cortex causes different MEP responses. A question of interest is whether a collection of MEP responses can be used to identify the stimulated locations on the cortex, which could potentially be used to then place the TMS coil to produce chosen sets of MEPs. In this work we leverage our previous report on a 3D convolutional neural network (CNN) architecture that predicted MEPs from the induced electric field, to tackle an inverse imaging task in which we start with the MEPs and estimate the stimulated regions on the motor cortex. We present and evaluate five different inverse imaging CNN architectures, both conventional and generative, in terms of several measures of reconstruction accuracy. We found that one architecture, which we propose as M2M-InvNet, consistently achieved the best performance.
Assuntos
Córtex Motor , Humanos , Córtex Motor/fisiologia , Estimulação Magnética Transcraniana/métodos , Músculo Esquelético/fisiologia , Potencial Evocado Motor/fisiologia , Neurônios , Eletromiografia/métodosRESUMO
Background: Spirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC-failing data meaningfully contribute to the prediction of airflow obstruction and all-cause mortality. Methods: We evaluated volume-time spirometry data from the UK Biobank. We identified "best" spirometry efforts as those passing QC with the maximum FVC. "Discarded" efforts were either submaximal or failed QC. To create a combined representation of lung function we implemented a contrastive learning approach, Spirogram-based Contrastive Learning Framework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). In a held-out 20% testing subset we applied the Spiro-CLF representation of a participant's overall lung function to 1) binary predictions of FEV1/FVC < 0.7 and FEV1 Percent Predicted (FEV1PP) < 80%, indicative of airflow obstruction, and 2) Cox regression for all-cause mortality. Findings: We included 940,705 volume-time curves from 352,684 UK Biobank participants with 2-3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 24.1% failed QC and 37.5% were submaximal. Spiro-CLF prediction of FEV1/FVC < 0.7 utilizing discarded spirometry efforts had an Area under the Receiver Operating Characteristics (AUROC) of 0.981 (0.863 for FEV1PP prediction). Incorporating discarded spirometry efforts in all-cause mortality prediction was associated with a concordance index (c-index) of 0.654, which exceeded the c-indices from FEV1 (0.590), FVC (0.559), or FEV1/FVC (0.599) from each participant's single best effort. Interpretation: A contrastive learning model using raw spirometry curves can accurately predict lung function using submaximal and QC-failing efforts. This model also has superior prediction of all-cause mortality compared to standard lung function measurements. Funding: MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.BDH is supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.DH is supported by NIH 2T32HL007427-41EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, P01 HL132825, and P01 HL114501.PJC is supported by NIH R01HL124233 and R01HL147326.SPB is supported by NIH R01HL151421 and UH3HL155806.TY, FH, and CYM are employees of Google LLC.
RESUMO
Lifelong Learning (LL) refers to the ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Much attention has been given lately to Supervised Lifelong Learning (SLL) with a stream of labelled data. In contrast, we focus on resolving challenges in Unsupervised Lifelong Learning (ULL) with streaming unlabelled data when the data distribution and the unknown class labels evolve over time. Bayesian framework is natural to incorporate past knowledge and sequentially update the belief with new data. We develop a fully Bayesian inference framework for ULL with a novel end-to-end Deep Bayesian Unsupervised Lifelong Learning (DBULL) algorithm, which can progressively discover new clusters without forgetting the past with unlabelled data while learning latent representations. To efficiently maintain past knowledge, we develop a novel knowledge preservation mechanism via sufficient statistics of the latent representation for raw data. To detect the potential new clusters on the fly, we develop an automatic cluster discovery and redundancy removal strategy in our inference inspired by Nonparametric Bayesian statistics techniques. We demonstrate the effectiveness of our approach using image and text corpora benchmark datasets in both LL and batch settings.
Assuntos
Algoritmos , Educação Continuada , Teorema de BayesRESUMO
Background: The heterogeneous nature of chronic obstructive pulmonary disease (COPD) complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features. Methods: We included 4496 smokers with available data from their enrollment and 5-year follow-up visits in the COPD Genetic Epidemiology (COPDGene®) study. We constructed linear regression (LR) and supervised random forest models to predict 5-year progression in forced expiratory in 1 second (FEV1) from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit. Results: Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For random forest, R-squared was 0.15 and the area under the receiver operator characteristic (ROC) curves for the prediction of participants in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). Random forest provided slightly better performance than LR. The accuracy was best for Global initiative for chronic Obstructive Lung Disease (GOLD) grades 1-2 participants, and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD. Conclusion: Random forest, along with deep phenotyping, predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.