RESUMO
The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https://github.com/luyiyun/mmAAVI.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Biologia Computacional/métodos , Software , AlgoritmosRESUMO
BACKGROUND: Neoadjuvant chemo-immunotherapy combination has shown remarkable advances in the management of esophageal squamous cell carcinoma (ESCC). However, the identification of a reliable biomarker for predicting the response to this chemo-immunotherapy regimen remains elusive. While computed tomography (CT) is widely utilized for response evaluation, its inherent limitations in terms of accuracy are well recognized. Therefore, in this study, we present a novel technique to predict the response of ESCC patients before receiving chemo-immunotherapy by testing volatile organic compounds (VOCs) in exhaled breath. METHODS: This study employed a prospective-specimen-collection, retrospective-blinded-evaluation design. Patients' baseline breath samples were collected and analyzed using high-pressure photon ionization time-of-flight mass spectrometry (HPPI-TOFMS). Subsequently, patients were categorized as responders or non-responders based on the evaluation of therapeutic response using pathology (for patients who underwent surgery) or CT images (for patients who did not receive surgery). RESULTS: A total of 133 patients were included in this study, with 91 responders who achieved either a complete response (CR) or a partial response (PR), and 42 non-responders who had stable disease (SD) or progressive disease (PD). Among 83 participants who underwent both evaluations with CT and pathology, the paired t-test revealed significant differences between the two methods (p < 0.05). For the breath test prediction model using breath test data from all participants, the validation set demonstrated mean area under the curve (AUC) of 0.86 ± 0.06. For 83 patients with pathological reports, the breath test achieved mean AUC of 0.845 ± 0.123. CONCLUSIONS: Since CT has inherent weakness in hollow organ assessment and no other ideal biomarker has been found, our study provided a noninvasive, feasible, and inexpensive tool that could precisely predict ESCC patients' response to neoadjuvant chemo-immunotherapy combination using breath test based on HPPI-TOFMS.
Assuntos
Neoplasias Esofágicas , Carcinoma de Células Escamosas do Esôfago , Humanos , Carcinoma de Células Escamosas do Esôfago/terapia , Neoplasias Esofágicas/terapia , Neoplasias Esofágicas/tratamento farmacológico , Estudos Retrospectivos , Estudos Prospectivos , Terapia Neoadjuvante , Testes Respiratórios/métodos , BiomarcadoresRESUMO
The incorporation of real-world data (RWD) into medical product development and evaluation has exhibited consistent growth. However, there is no universally adopted method of how much information to borrow from external data. This paper proposes a study design methodology called Tree-based Monte Carlo (TMC) that dynamically integrates patients from various RWD sources to calculate the treatment effect based on the similarity between clinical trial and RWD. Initially, a propensity score is developed to gauge the resemblance between clinical trial data and each real-world dataset. Utilizing this similarity metric, we construct a hierarchical clustering tree that delineates varying degrees of similarity between each RWD source and the clinical trial data. Ultimately, a Gaussian process methodology is employed across this hierarchical clustering framework to synthesize the projected treatment effects of the external group. Simulation result shows that our clustering tree could successfully identify similarity. Data sources exhibiting greater similarity with clinical trial are accorded higher weights in treatment estimation process, while less congruent sources receive comparatively lower emphasis. Compared with another Bayesian method, meta-analytic predictive prior (MAP), our proposed method's estimator is closer to the true value and has smaller bias.
RESUMO
INTRODUCTION: Untargeted metabolomics based on liquid chromatography-mass spectrometry is inevitably affected by batch effects that are caused by non-biological systematic bias. Previously, we developed a novel method called WaveICA to remove batch effects for untargeted metabolomics data. To detect batch effect information, the method relies on a batch label. However, it cannot be used in the scenario in which there is only one batch of data or the batch label is unknown. OBJECTIVES: We aim to improve the WaveICA method to remove batch effects for untargeted metabolomics data without using batch information. METHODS: We improved the WaveICA method by developing WaveICA 2.0 to remove batch effects for metabolomics data, and provided an R package WaveICA_2.0 to implement this method. RESULTS: The performance of the WaveICA 2.0 method was evaluated on real metabolomics data. For metabolomics data with three batches, the performance of the WaveICA 2.0 method was similar to that of the WaveICA method in terms of gathering quality control samples (QCSs) and subject samples together in principle component analysis score plots, increasing the similarity of QCSs, increasing differential peaks, and improving classification accuracy. For metabolomics data with only one batch, the WaveICA 2.0 method had a strong ability to remove intensity drift and reveal more biological information and outperformed the QC-RLSC and QC-SVRC methods in our study using our metabolomics data. CONCLUSION: Our results demonstrated that the WaveICA 2.0 method can be used in practice to remove batch effects for untargeted metabolomics data without batch information.
Assuntos
Metabolômica , Projetos de Pesquisa , Cromatografia Líquida , Espectrometria de Massas , Análise de Componente PrincipalRESUMO
The tumor immune microenvironment is heterogeneous, and its impact on treatment responses is not well understood. It is still a challenge to analyze the interaction between malignant cells and the tumor microenvironment to apply suitable immunotherapy in lung adenocarcinoma. We performed the nonnegative matrix factorization method to 513 messenger RNA expression profiles of lung adenocarcinomas (LUADs) from The Cancer Genome Atlas (TCGA) to obtain an immune-related expression pattern. Subsequently, we characterized the immune-related gene signatures and clinical and survival characteristics. We used 576 patients from Gene Expression Omnibus to confirm our findings. Of the patients in the training cohort, 51% had a high immune enrichment score, high expression of immune cell signaling, cytolytic activity, and interferon (IFN)-related signatures (all P < .05). We denoted these as the Immune Class. We further subdivided the Immune Class into two subclasses based on the tumor microenvironment. These were denoted the Active Immune Class and Exhausted Immune Class. The former showed significant IFN, T-cells, M1 macrophage signatures, and better prognosis (all P < .05), while the latter presented an exhausted immune response with activated stromal enrichment, M2 macrophage signatures, and immunosuppressive factors such as WNT/transforming growth factor-ß (all P < .05). Furthermore, we predicted the response of our immunophenotypes to immunological checkpoint inhibitors (P < .05). Our findings provide a novel insight into the immune-related state of LUAD and can identify the patients who will be receptive to suitable immunotherapeutic treatments.
Assuntos
Adenocarcinoma de Pulmão/patologia , Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica , Neoplasias Pulmonares/patologia , Transcriptoma , Microambiente Tumoral/imunologia , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/imunologia , Adenocarcinoma de Pulmão/metabolismo , Idoso , Apoptose , Biomarcadores Tumorais/genética , Proliferação de Células , Feminino , Humanos , Imunofenotipagem , Imunoterapia , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/imunologia , Masculino , Pessoa de Meia-Idade , Prognóstico , Taxa de Sobrevida , Células Tumorais CultivadasRESUMO
Untargeted metabolomics based on liquid chromatography-mass spectrometry is affected by nonlinear batch effects, which cover up biological effects, result in nonreproducibility, and are difficult to be calibrate. In this study, we propose a novel deep learning model, called Normalization Autoencoder (NormAE), which is based on nonlinear autoencoders (AEs) and adversarial learning. An additional classifier and ranker are trained to provide adversarial regularization during the training of the AE model, latent representations are extracted by the encoder, and then the decoder reconstructs the data without batch effects. The NormAE method was tested on two real metabolomics data sets. After calibration by NormAE, the quality control samples (QCs) for both data sets gathered most closely in a PCA score plot (average distances decreased from 56.550 and 52.476 to 7.383 and 14.075, respectively) and obtained the highest average correlation coefficients (from 0.873 and 0.907 to 0.997 for both). Additionally, NormAE significantly improved biomarker discovery (median number of differential peaks increased from 322 and 466 to 1140 and 1622, respectively). NormAE was compared with four commonly used batch effect removal methods. The results demonstrated that using NormAE produces the best calibration results.
Assuntos
Aprendizado Profundo , Metabolômica , Calibragem , Cromatografia Líquida , Espectrometria de Massas , Controle de QualidadeRESUMO
INTRODUCTION: Colorectal cancer (CRC) remains an incurable disease. Previous metabolomic studies show that metabolic signatures in plasma distinguish CRC patients from healthy controls. Chronic enteritis (CE) represents a risk factor for CRC, with a 20 fold greater incidence than in healthy individuals. However, no studies have performed metabolomic profiling to investigate CRC biomarkers in CE. OBJECTIVE: Our aims were to identify metabolomic signatures in CRC and CE and to search for blood-derived metabolite biomarkers distinguishing CRC from CE, especially early-stage biomarkers. METHODS: In this case-control study, 612 subjects were prospectively recruited between May 2015 and May 2016, and including 539 CRC patients (stage I, 102 cases; stage II, 259 cases; stage III, 178 cases) and 73 CE patients. Untargeted metabolomics was performed to identify CRC-related metabolic signatures in CE. RESULTS: Five pathways were significantly enriched based on 153 differential metabolites between CRC and CE. 16 biomarkers were identified for diagnosis of CRC from CE and for guiding CRC staging. The AUC value for CRC diagnosis in the external validation set was 0.85. Good diagnostic performances were also achieved for early-stage CRC (stage I and stage II), with an AUC value of 0.84. The biomarker panel could also stage CRC patients, with an AUC of 0.72 distinguishing stage I from stage II CRC and AUC of 0.74 distinguishing stage II from stage III CRC. CONCLUSIONS: The identified metabolic biomarkers exhibit promising properties for CRC monitoring in CE patients and are superior to commonly used clinical biomarkers (CEA and CA19-9).
Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias Colorretais/metabolismo , Enterite/metabolismo , Biomarcadores Tumorais/sangue , Estudos de Casos e Controles , Doença Crônica , Neoplasias Colorretais/sangue , Neoplasias Colorretais/diagnóstico , Enterite/sangue , Enterite/diagnóstico , Feminino , Humanos , Masculino , Metabolômica , Pessoa de Meia-Idade , Estadiamento de Neoplasias , FenótipoRESUMO
Platelet-derived growth factor-D (PDGF-D) can enhance invasion and metastasis in several human malignancies. Although several studies have been performed to investigate the association between clinicopathological characteristics and prognosis in epithelial ovarian cancer (EOC), the mediation effect of PDGF-D on above-mentioned association have been seldom assessed. In this study, we detected the PDGF-D expression from the tissues of patients with EOC and further collected clinicopathological characteristics and prognostic information to identify whether PDGF-D mediated the effect of differentiated degree on prognosis in patients with EOC. A total of 190 paraffin-embedded tissue samples from patients with EOC between July 2005 and December 2010 were collected. We performed a Kaplan-Meier analysis for the association between differentiated degree and prognosis followed by a causal mediation analysis. The analysis results indicated that differentiated degree was associated with prognosis and PDGF-D mediated the effect of differentiated degree on prognosis in patients with EOC, which might be a potential target for ovarian cancer treatment.
RESUMO
OBJECTIVE: We sought to identify novel molecular subtypes of high-grade serous ovarian cancer (HGSC) by the integration of gene expression and proteomics data and to find the underlying biological characteristics of ovarian cancer to improve the clinical outcome. METHODS: The iCluster method was utilized to analysis 131 common HGSC samples between TCGA and Clinical Proteomic Tumor Analysis Consortium databases. Kaplan-Meier survival curves were used to estimate the overall survival of patients, and the differences in survival curves were assessed using the log-rank test. RESULTS: Two novel ovarian cancer subtypes with different overall survival (P = .00114) and different platinum status (P = .0061) were identified. Eighteen messenger RNAs and 38 proteins were selected as differential molecules between subtypes. Pathway analysis demonstrated arrhythmogenic right ventricular cardiomyopathy pathway played a critical role in the discrimination of these two subtypes and desmosomal cadherin DSG2, DSP, JUP, and PKP2 in this pathway were overexpression in subtype I compared with subtype II. CONCLUSION: Our study extended the underlying prognosis-related biological characteristics of high-grade serous ovarian cancer. Enrichment of desmosomal cadherin increased the risk for HGSC prognosis among platinum-sensitive patients, the results guided the revision of the treatment options for platinum-sensitive ovarian cancer patients to improve outcomes.
Assuntos
Cistadenocarcinoma Seroso/genética , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias Ovarianas/genética , Proteômica/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Cistadenocarcinoma Seroso/classificação , Cistadenocarcinoma Seroso/metabolismo , Caderinas de Desmossomos/genética , Caderinas de Desmossomos/metabolismo , Feminino , Humanos , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Gradação de Tumores , Neoplasias Ovarianas/classificação , Neoplasias Ovarianas/metabolismo , Ovário/efeitos dos fármacos , Ovário/metabolismo , Ovário/patologia , Platina/uso terapêutico , PrognósticoRESUMO
Renal clear cell carcinoma (RCC) patients who do not achieve optimal control of progression with immune checkpoint blockade (ICB) should be further studied. Unsupervised consensus clustering was used to group 525 RCC patients based on two typical ICB pathways, CTLA-4 and pogrammed death 1 (PD-1)/programmed death-ligand 1 (PD-L1), as well as two new discovered regulators, CMTM6 and CMTM4. Three immune molecular subtypes (IMMSs) with different clinical and immunological characteristics were identified (type I, II, and III), among which there were more stage I and low-grade tumors in type I RCC than in type II and III. The proportion of males was highest in type II RCC. Overall survival of type II and III was similar (5.2 and 6 years) and statistically shorter than that of type I (7.6 years) before and after adjusting for age and gender. When conducting stratified analysis, our IMMSs were able to identify high-risk patients among middle-aged patients, males, and stage IV patients. Among the differentially expressed genes, approximately 84% were highly expressed in type II and III RCC. Genes related to ICB (CTLA-4, CD274, and PDCD1LG2) and cytotoxic lymphocytes (CD8A, GZMA, and PRF1) were all highly expressed in type II and III RCC. These results documented that patients with type II and III cancer may be more sensitive to anti-CTLA-4 therapy, anti-PD-1/PD-L1 therapy, and a combination of immunotherapies. High expression of CMTM4 in type I RCC (69%) and a statistically significant interaction of CD274 and CMTM6 indicated that CMTM4/6 might be new therapy targets for type I, who are resistant to ICB.
Assuntos
Carcinoma de Células Renais/metabolismo , Imunofenotipagem/métodos , Idoso , Antígeno B7-H1/metabolismo , Antígenos CD8/metabolismo , Antígeno CTLA-4/metabolismo , Carcinoma de Células Renais/imunologia , Feminino , Granzimas/metabolismo , Humanos , Proteínas com Domínio MARVEL/metabolismo , Masculino , Pessoa de Meia-Idade , Perforina/metabolismo , Proteína 2 Ligante de Morte Celular Programada 1/metabolismo , Microambiente Tumoral/genética , Microambiente Tumoral/fisiologiaRESUMO
The metabolic profiling of biofluids using untargeted metabolomics provides a promising choice to discover metabolite biomarkers for clinical cancer diagnosis. However, metabolite biomarkers discovered in biofluids may not necessarily reflect the pathological status of tumor tissue, which makes these biomarkers difficult to reproduce. In this study, we developed a new analysis strategy by integrating the univariate and multivariate correlation analysis approach to discover tumor tissue derived (TTD) metabolites in plasma samples. Specifically, untargeted metabolomics was first used to profile a set of paired tissue and plasma samples from 34 colorectal cancer (CRC) patients. Next, univariate correlation analysis was used to select correlative metabolite pairs between tissue and plasma, and a random forest regression model was utilized to define 243 TTD metabolites in plasma samples. The TTD metabolites in CRC plasma were demonstrated to accurately reflect the pathological status of tumor tissue and have great potential for metabolite biomarker discovery. Accordingly, we conducted a clinical study using a set of 146 plasma samples from CRC patients and gender-matched polyp controls to discover metabolite biomarkers from TTD metabolites. As a result, eight metabolites were selected as potential biomarkers for CRC diagnosis with high sensitivity and specificity. For CRC patients after surgery, the survival risk score defined by metabolite biomarkers also performed well in predicting overall survival time ( p = 0.022) and progression-free survival time ( p = 0.002). In conclusion, we developed a new analysis strategy which effectively discovers tumor tissue related metabolite biomarkers in plasma for cancer diagnosis and prognosis.
Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/sangue , Análise Discriminante , Feminino , Humanos , Análise dos Mínimos Quadrados , Masculino , Metaboloma , Metabolômica/métodos , Metabolômica/estatística & dados numéricos , Pessoa de Meia-Idade , Análise de Componente Principal , Prognóstico , Estatísticas não ParamétricasRESUMO
Serous ovarian cancer (SOC) is the most common form of the histological subtype of epithelial ovarian cancer, with the worst clinical outcome. Despite improvements in surgery and chemotherapy, most patients with SOC experience recurrence within 12-18 months of first-line treatment. Current studies are unable to robustly predict the recurrence of SOC, and more accurate predictive models are urgently required. We have, therefore, developed a novel pathway-structured model to predict the recurrence of SOC. We trained the model on a set of 333 patients and validated it in 3 diversified validation datasets of 403 patients. Genes significantly associated with recurrence within each pathway were identified using a Cox proportional hazards model based on LASSO estimation in the training dataset. Next, a pathway-structured scoring matrix was obtained after computation of the prognostic score for each pathway by fitting to the Cox proportional hazards model. With the pathway-structure scoring matrix as an input, the pathway-based recurrent signatures were identified using the Cox proportional hazards model based on LASSO estimation and the significant pathway-based signatures were externally validated in 3 independent datasets. Meanwhile, our pathway-structured model was compared with a commonly used gene-based model. Our results revealed that our 12 pathway-based signatures successfully predicted the recurrence of SOC with high accuracy in the training dataset and in the 3 validation datasets. Moreover, our pathway-structured model was superior to the gene-based model in 4 datasets. The pathways selected in our study will provide new insights into the pathogenesis and clinical treatments of SOC.
Assuntos
Carcinoma Epitelial do Ovário/metabolismo , Redes e Vias Metabólicas/genética , Modelos Biológicos , Recidiva Local de Neoplasia/metabolismo , Neoplasias Ovarianas/metabolismo , Transcriptoma , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/genética , Confiabilidade dos Dados , Bases de Dados Genéticas , Feminino , Humanos , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Prognóstico , Modelos de Riscos Proporcionais , Curva ROCRESUMO
As an autoimmune-mediated inflammatory demyelinating disease of the central nervous system, multiple sclerosis (MS) is often confused with cerebral small vessel disease (cSVD), which is a regional pathological change in brain tissue with unknown pathogenesis. This is due to their similar clinical presentations and imaging manifestations. That misdiagnosis can significantly increase the occurrence of adverse events. Delayed or incorrect treatment is one of the most important causes of MS progression. Therefore, the development of a practical diagnostic imaging aid could significantly reduce the risk of misdiagnosis and improve patient prognosis. We propose an interpretable deep learning (DL) model that differentiates MS and cSVD using T2-weighted fluid-attenuated inversion recovery (FLAIR) images. Transfer learning (TL) was utilized to extract features from the ImageNet dataset. This pioneering model marks the first of its kind in neuroimaging, showing great potential in enhancing differential diagnostic capabilities within the field of neurological disorders. Our model extracts the texture features of the images and achieves more robust feature learning through two attention modules. The attention maps provided by the attention modules provide model interpretation to validate model learning and reveal more information to physicians. Finally, the proposed model is trained end-to-end using focal loss to reduce the influence of class imbalance. The model was validated using clinically diagnosed MS (n=112) and cSVD (n=321) patients from the Beijing Tiantan Hospital. The performance of the proposed model was better than that of two commonly used DL approaches, with a mean balanced accuracy of 86.06 % and a mean area under the receiver operating characteristic curve of 98.78 %. Moreover, the generated attention heat maps showed that the proposed model could focus on the lesion signatures in the image. The proposed model provides a practical diagnostic imaging aid for the use of routinely available imaging techniques such as magnetic resonance imaging to classify MS and cSVD by linking DL to human brain disease. We anticipate a substantial improvement in accurately distinguishing between various neurological conditions through this novel model.
Assuntos
Doenças de Pequenos Vasos Cerebrais , Aprendizado Profundo , Esclerose Múltipla , Humanos , Doenças de Pequenos Vasos Cerebrais/diagnóstico por imagem , Esclerose Múltipla/diagnóstico por imagem , Masculino , Imageamento por Ressonância Magnética/métodos , Feminino , Redes Neurais de Computação , Interpretação de Imagem Assistida por Computador/métodos , Pessoa de Meia-Idade , Adulto , Neuroimagem/métodosRESUMO
The identification of compound-protein interactions (CPIs) plays a vital role in drug discovery. However, the huge cost and labor-intensive nature in vitro and vivo experiments make it urgent for researchers to develop novel CPI prediction methods. Despite emerging deep learning methods have achieved promising performance in CPI prediction, they also face ongoing challenges: (i) providing bidirectional interpretability from both the chemical and biological perspective for the prediction results; (ii) comprehensively evaluating model generalization performance; (iii) demonstrating the practical applicability of these models. To overcome the challenges posed by current deep learning methods, we propose a cross multi-head attention oriented bidirectional interpretable CPI prediction model (CmhAttCPI). First, CmhAttCPI takes molecular graphs and protein sequences as inputs, utilizing the GCW module to learn atom features and the CNN module to learn residue features, respectively. Second, the model applies cross multi-head attention module to compute attention weights for atoms and residues. Finally, CmhAttCPI employs a fully connected neural network to predict scores for CPIs. We evaluated the performance of CmhAttCPI on balanced datasets and imbalanced datasets. The results consistently show that CmhAttCPI outperforms multiple state-of-the-art methods. We constructed three scenarios based on compound and protein clustering and comprehensively evaluated the model generalization ability within these scenarios. The results demonstrate that the generalization ability of CmhAttCPI surpasses that of other models. Besides, the visualizations of attention weights reveal that CmhAttCPI provides chemical and biological interpretation for CPI prediction. Moreover, case studies confirm the practical applicability of CmhAttCPI in discovering anticancer candidates.
Assuntos
Descoberta de Drogas , Trabalho de Parto , Gravidez , Feminino , Humanos , Sequência de Aminoácidos , Análise por Conglomerados , Redes Neurais de ComputaçãoRESUMO
Cancer exerts a multitude of effects on metabolism, including the reprogramming of cellular metabolic pathways and alterations in metabolites that facilitate inappropriate proliferation of cancer cells and adaptation to the tumor microenvironment. There is a growing body of evidence suggesting that aberrant metabolites play pivotal roles in tumorigenesis and metastasis, and have the potential to serve as biomarkers for personalized cancer therapy. Importantly, high-throughput metabolomics detection techniques and machine learning approaches offer tremendous potential for clinical oncology by enabling the identification of cancer-specific metabolites. Emerging research indicates that circulating metabolites have great promise as noninvasive biomarkers for cancer detection. Therefore, this review summarizes reported abnormal cancer-related metabolites in the last decade and highlights the application of metabolomics in liquid biopsy, including detection specimens, technologies, methods, and challenges. The review provides insights into cancer metabolites as a promising tool for clinical applications.
RESUMO
The prediction of response to drugs before initiating therapy based on transcriptome data is a major challenge. However, identifying effective drug response label data costs time and resources. Methods available often predict poorly and fail to identify robust biomarkers due to the curse of dimensionality: high dimensionality and low sample size. Therefore, this necessitates the development of predictive models to effectively predict the response to drugs using limited labeled data while being interpretable. In this study, we report a novel Hierarchical Graph Random Neural Networks (HiRAND) framework to predict the drug response using transcriptome data of few labeled data and additional unlabeled data. HiRAND completes the information integration of the gene graph and sample graph by graph convolutional network (GCN). The innovation of our model is leveraging data augmentation strategy to solve the dilemma of limited labeled data and using consistency regularization to optimize the prediction consistency of unlabeled data across different data augmentations. The results showed that HiRAND achieved better performance than competitive methods in various prediction scenarios, including both simulation data and multiple drug response data. We found that the prediction ability of HiRAND in the drug vorinostat showed the best results across all 62 drugs. In addition, HiRAND was interpreted to identify the key genes most important to vorinostat response, highlighting critical roles for ribosomal protein-related genes in the response to histone deacetylase inhibition. Our HiRAND could be utilized as an efficient framework for improving the drug response prediction performance using few labeled data.
RESUMO
Efficient and accurate distinction of histopathological subtype of lung cancer is quite critical for the individualized treatment. So far, artificial intelligence techniques have been developed, whose performance yet remained debatable on more heterogenous data, hindering their clinical deployment. Here, we propose an end-to-end, well-generalized and data-efficient weakly supervised deep learning-based method. The method, end-to-end feature pyramid deep multi-instance learning model (E2EFP-MIL), contains an iterative sampling module, a trainable feature pyramid module and a robust feature aggregation module. E2EFP-MIL uses end-to-end learning to extract generalized morphological features automatically and identify discriminative histomorphological patterns. This method is trained with 1007 whole slide images (WSIs) of lung cancer from TCGA, with AUCs of 0.95-0.97 in test sets. We validated E2EFP-MIL in 5 real-world external heterogenous cohorts including nearly 1600 WSIs from both United States and China with AUCs of 0.94-0.97, and found that 100-200 training images are enough to achieve an AUC of >0.9. E2EFP-MIL overperforms multiple state-of-the-art MIL-based methods with high accuracy and low hardware requirements. Excellent and robust results prove generalizability and effectiveness of E2EFP-MIL in clinical practice. Our code is available at https://github.com/raycaohmu/E2EFP-MIL.
Assuntos
Inteligência Artificial , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Área Sob a Curva , China , Redes Neurais de ComputaçãoRESUMO
The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.
Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Multiômica , Teorema de Bayes , Análise por ConglomeradosRESUMO
BACKGROUND: As a major component of the tumor tissue, the tumor microenvironment (TME) has been proven to associate with tumor progression and immunotherapy. Ovarian cancer accounts for the highest mortality rate among gynecologic malignancies. Its clinical treatment decision is highly correlated with the prognosis, underscoring the need to evaluate the prognosis and choose the proper clinical treatment through TME information. METHOD: This study constructs a score with TME information obtained by the CIBERSORT algorithm, which classifies the patients into high and low TMEscore groups with quantified TME infiltration patterns through the PCA algorithm. TMEscore was constructed by TCGA cohort and validated in GEO cohort. Univariate and multivariate Cox proportional hazards model analyses were used to demonstrate prognostic value of TMEscore in overall and stratified analysis. RESULT: TMEscore is highly correlated with survival and high TMEscore group has a better prognosis. In order to improve treatment decision, the expression of immune checkpoints, immunophenoscore (IPS) and ESTIMATE score showed a high TMEscore have a better immune microenvironment and respond better to immune checkpoint inhibitors (ICIs). Meanwhile, the mutation landscape between TMEscore groups was profiled, and 13 genes were found mutated differently between the two groups. Among them, BRCA1 has more mutations in the high TMEscore group and speculated that high TMEscore patients might be a beneficiary population of PARP inhibitors combined with immunotherapy. CONCLUSION: TMEscore based on TME with prognostic value and clinical value is proposed for the identification of targets treatment and immunotherapy strategies for ovarian cancer.
Assuntos
Neoplasias Ovarianas , Microambiente Tumoral , Carcinoma Epitelial do Ovário/genética , Carcinoma Epitelial do Ovário/terapia , Feminino , Humanos , Inibidores de Checkpoint Imunológico/uso terapêutico , Imunoterapia , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/terapia , Prognóstico , Microambiente Tumoral/genéticaRESUMO
Aims: Given the reversibility of methylation, biomarkers with discriminating ability are of great interest for targeted therapeutic sites. Materials & methods: Methylation array data of 461 lung adenocarcinoma (LUAD) patients comprising of 458 tumor and 32 LUAD paracancerous samples were compared using partial least squares discrimination analysis and receiver operating characteristics analysis. Results: A six-DNA methylation signature (corresponding to five genes) was found to significantly discriminate normal and LUAD samples. Kyoto Encyclopedia of Genes and Genomes analysis indicated enrichment of methylation sites in the Wnt pathway in LUAD compared with controls. Conclusion: This six-DNA methylation signature demonstrated potential as a novel biomarker for diagnosis and therapeutic targets. Further, inhibition of Wnt signaling pathway may be an important step in LUAD progression.