Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
1.
J Biomed Inform ; 152: 104629, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38552994

RESUMO

BACKGROUND: In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS: The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.


Assuntos
Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Algoritmos , Modelos Estatísticos
2.
J Transl Med ; 22(1): 226, 2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38429796

RESUMO

BACKGROUND: Breast Cancer (BC) is a highly heterogeneous and complex disease. Personalized treatment options require the integration of multi-omic data and consideration of phenotypic variability. Radiogenomics aims to merge medical images with genomic measurements but encounter challenges due to unpaired data consisting of imaging, genomic, or clinical outcome data. In this study, we propose the utilization of a well-trained conditional generative adversarial network (cGAN) to address the unpaired data issue in radiogenomic analysis of BC. The generated images will then be used to predict the mutations status of key driver genes and BC subtypes. METHODS: We integrated the paired MRI and multi-omic (mRNA gene expression, DNA methylation, and copy number variation) profiles of 61 BC patients from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). To facilitate this integration, we employed a Bayesian Tensor Factorization approach to factorize the multi-omic data into 17 latent features. Subsequently, a cGAN model was trained based on the matched side-view patient MRIs and their corresponding latent features to predict MRIs for BC patients who lack MRIs. Model performance was evaluated by calculating the distance between real and generated images using the Fréchet Inception Distance (FID) metric. BC subtype and mutation status of driver genes were obtained from the cBioPortal platform, where 3 genes were selected based on the number of mutated patients. A convolutional neural network (CNN) was constructed and trained using the generated MRIs for mutation status prediction. Receiver operating characteristic area under curve (ROC-AUC) and precision-recall area under curve (PR-AUC) were used to evaluate the performance of the CNN models for mutation status prediction. Precision, recall and F1 score were used to evaluate the performance of the CNN model in subtype classification. RESULTS: The FID of the images from the well-trained cGAN model based on the test set is 1.31. The CNN for TP53, PIK3CA, and CDH1 mutation prediction yielded ROC-AUC values 0.9508, 0.7515, and 0.8136 and PR-AUC are 0.9009, 0.7184, and 0.5007, respectively for the three genes. Multi-class subtype prediction achieved precision, recall and F1 scores of 0.8444, 0.8435 and 0.8336 respectively. The source code and related data implemented the algorithms can be found in the project GitHub at https://github.com/mattthuang/BC_RadiogenomicGAN . CONCLUSION: Our study establishes cGAN as a viable tool for generating synthetic BC MRIs for mutation status prediction and subtype classification to better characterize the heterogeneity of BC in patients. The synthetic images also have the potential to significantly augment existing MRI data and circumvent issues surrounding data sharing and patient privacy for future BC machine learning studies.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Radiômica , Variações do Número de Cópias de DNA , Teorema de Bayes , Imageamento por Ressonância Magnética/métodos , Mutação/genética
3.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38449285

RESUMO

MOTIVATION: Drug-target interaction (DTI) prediction aims to identify interactions between drugs and protein targets. Deep learning can automatically learn discriminative features from drug and protein target representations for DTI prediction, but challenges remain, making it an open question. Existing approaches encode drugs and targets into features using deep learning models, but they often lack explanations for underlying interactions. Moreover, limited labeled DTIs in the chemical space can hinder model generalization. RESULTS: We propose an interpretable nested graph neural network for DTI prediction (iNGNN-DTI) using pre-trained molecule and protein models. The analysis is conducted on graph data representing drugs and targets by using a specific type of nested graph neural network, in which the target graphs are created based on 3D structures using Alphafold2. This architecture is highly expressive in capturing substructures of the graph data. We use a cross-attention module to capture interaction information between the substructures of drugs and targets. To improve feature representations, we integrate features learned by models that are pre-trained on large unlabeled small molecule and protein datasets, respectively. We evaluate our model on three benchmark datasets, and it shows a consistent improvement on all baseline models in all datasets. We also run an experiment with previously unseen drugs or targets in the test set, and our model outperforms all of the baselines. Furthermore, the iNGNN-DTI can provide more insights into the interaction by visualizing the weights learned by the cross-attention module. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithm is available at https://github.com/syan1992/iNGNN-DTI.


Assuntos
Algoritmos , Redes Neurais de Computação , Interações Medicamentosas , Benchmarking , Sistemas de Liberação de Medicamentos
4.
Pediatr Allergy Immunol ; 34(10): e14032, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37877849

RESUMO

BACKGROUND: Identifying children at high risk of developing asthma can facilitate prevention and early management strategies. We developed a prediction model of children's asthma risk using objectively collected population-based children and parental histories of comorbidities. METHODS: We conducted a retrospective population-based cohort study using administrative data from Manitoba, Canada, and included children born from 1974 to 2000 with linkages to ≥1 parent. We identified asthma and prior comorbid condition diagnoses from hospital and outpatient records. We used two machine-learning models: least absolute shrinkage and selection operator (LASSO) logistic regression (LR) and random forest (RF) to identify important predictors. The predictors in the base model included children's demographics, allergic conditions, respiratory infections, and parental asthma. Subsequent models included additional multiple comorbidities for children and parents. RESULTS: The cohort included 195,666 children: 51.3% were males and 17.7% had asthma diagnosis. The base LR model achieved a low predictive performance with sensitivity of 0.47, 95% confidence interval (0.45-0.48), and specificity of 0.67 (0.66-0.67) using a predicted probability threshold of 0.20. Sensitivity significantly improved when children's comorbidities were included using LASSO LR: 0.71 (0.69-0.72). Predictive performance further improved by including parental comorbidities (sensitivity = 0.72 [0.70-0.73], specificity = 0.69 [0.69-0.70]). We observed similar results for the RF models. Children's menstrual disorders and mood and anxiety disorders, parental lipid metabolism disorders and asthma were among the most important variables that predicted asthma risk. CONCLUSION: Including children and parental comorbidities to children's asthma prediction models improves their accuracy.


Assuntos
Asma , Masculino , Feminino , Humanos , Criança , Estudos de Coortes , Estudos Retrospectivos , Asma/diagnóstico , Asma/epidemiologia , Transtornos de Ansiedade , Canadá
5.
AMIA Jt Summits Transl Sci Proc ; 2023: 206-215, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37350925

RESUMO

Advancements in technology have enabled diverse tools and medical devices that are able to improve the efficiency of diagnosis and detection of various health diseases. Rheumatoid arthritis is an autoimmune disease that affects multiple joints including the wrist, hands and feet. We used YOLOv5l6 to detect these joints in radiograph images. In this paper, we show that training YOLOv5l6 on joint images of healthy patients is able to achieve a high performance when used to evaluate joint images of patients with rheumatoid arthritis, even when there is a limited number of training samples. In addition to training joint images from healthy individuals with YOLOv5l6, we added several data augmentation steps to further improve the generalization of the deep learning model.

6.
Bioinform Adv ; 3(1): vbad059, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37228387

RESUMO

Motivation: Human microbiome is complex and highly dynamic in nature. Dynamic patterns of the microbiome can capture more information than single point inference as it contains the temporal changes information. However, dynamic information of the human microbiome can be hard to be captured due to the complexity of obtaining the longitudinal data with a large volume of missing data that in conjunction with heterogeneity may provide a challenge for the data analysis. Results: We propose using an efficient hybrid deep learning architecture convolutional neural network-long short-term memory, which combines with self-knowledge distillation to create highly accurate models to analyze the longitudinal microbiome profiles to predict disease outcomes. Using our proposed models, we analyzed the datasets from Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT) study and DIABIMMUNE study. We showed the significant improvement in the area under the receiver operating characteristic curve scores, achieving 0.889 and 0.798 on PROTECT study and DIABIMMUNE study, respectively, compared with state-of-the-art temporal deep learning models. Our findings provide an effective artificial intelligence-based tool to predict disease outcomes using longitudinal microbiome profiles from collected patients. Availability and implementation: The data and source code can be accessed at https://github.com/darylfung96/UC-disease-TL.

7.
Comput Struct Biotechnol J ; 21: 2940-2949, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37216014

RESUMO

Background: Human epidermal growth receptor 2-positive (HER2+) breast cancer (BC) is a heterogeneous subgroup. Estrogen receptor (ER) status is emerging as a predictive marker within HER2+ BCs, with the HER2+/ER+ cases usually having better survival in the first 5 years after diagnosis but have higher recurrence risk after 5 years compared to HER2+/ER-. This is possibly because sustained ER signaling in HER2+ BCs helps escape the HER2 blockade. Currently HER2+/ER+ BC is understudied and lacks biomarkers. Thus, a better understanding of the underlying molecular diversity is important to find new therapy targets for HER2+/ER+ BCs. Methods: In this study, we performed unsupervised consensus clustering together with genome-wide Cox regression analyses on the gene expression data of 123 HER2+/ER+ BC from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) cohort to identify distinct HER2+/ER+ subgroups. A supervised eXtreme Gradient Boosting (XGBoost) classifier was then built in TCGA using the identified subgroups and validated in another two independent datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO) (accession number GSE149283)). Computational characterization analyses were also performed on the predicted subgroups in different HER2+/ER+ BC cohorts. Results: We identified two distinct HER2+/ER+ subgroups with different survival outcomes using the expression profiles of 549 survival-associated genes from the Cox regression analyses. Genome-wide gene expression differential analyses found 197 differentially expressed genes between the two identified subgroups, with 15 genes overlapping the 549 survival-associated genes.XGBoost classifier, using the expression values of the 15 genes, achieved a strong cross-validated performance (Area under the curve (AUC) = 0.85, Sensitivity = 0.76, specificity = 0.77) in predicting the subgroup labels. Further investigation partially confirmed the differences in survival, drug response, tumor-infiltrating lymphocytes, published gene signatures, and CRISPR-Cas9 knockout screened gene dependency scores between the two identified subgroups. Conclusion: This is the first study to stratify HER2+/ER+ tumors. Overall, the initial results from different cohorts showed there exist two distinct subgroups in HER2+/ER+ tumors, which can be distinguished by a 15-gene signature. Our findings could potentially guide the development of future precision therapies targeted on HER2+/ER+ BC.

8.
J Cheminform ; 15(1): 29, 2023 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-36843022

RESUMO

Graph convolutional neural networks (GCNs) have been repeatedly shown to have robust capacities for modeling graph data such as small molecules. Message-passing neural networks (MPNNs), a group of GCN variants that can learn and aggregate local information of molecules through iterative message-passing iterations, have exhibited advancements in molecular modeling and property prediction. Moreover, given the merits of Transformers in multiple artificial intelligence domains, it is desirable to combine the self-attention mechanism with MPNNs for better molecular representation. We propose an atom-bond transformer-based message-passing neural network (ABT-MPNN), to improve the molecular representation embedding process for molecular property predictions. By designing corresponding attention mechanisms in the message-passing and readout phases of the MPNN, our method provides a novel architecture that integrates molecular representations at the bond, atom and molecule levels in an end-to-end way. The experimental results across nine datasets show that the proposed ABT-MPNN outperforms or is comparable to the state-of-the-art baseline models in quantitative structure-property relationship tasks. We provide case examples of Mycobacterium tuberculosis growth inhibitors and demonstrate that our model's visualization modality of attention at the atomic level could be an insightful way to investigate molecular atoms or functional groups associated with desired biological properties. The new model provides an innovative way to investigate the effect of self-attention on chemical substructures and functional groups in molecular representation learning, which increases the interpretability of the traditional MPNN and can serve as a valuable way to investigate the mechanism of action of drugs.

9.
Nat Commun ; 14(1): 688, 2023 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755019

RESUMO

A proper understanding of disease etiology will require longitudinal systems-scale reconstruction of the multitiered architecture of eukaryotic signaling. Here we combine state-of-the-art data acquisition platforms and bioinformatics tools to devise PAMAF, a workflow that simultaneously examines twelve omics modalities, i.e., protein abundance from whole-cells, nucleus, exosomes, secretome and membrane; N-glycosylation, phosphorylation; metabolites; mRNA, miRNA; and, in parallel, single-cell transcriptomes. We apply PAMAF in an established in vitro model of TGFß-induced epithelial to mesenchymal transition (EMT) to quantify >61,000 molecules from 12 omics and 10 timepoints over 12 days. Bioinformatics analysis of this EMT-ExMap resource allowed us to identify; -topological coupling between omics, -four distinct cell states during EMT, -omics-specific kinetic paths, -stage-specific multi-omics characteristics, -distinct regulatory classes of genes, -ligand-receptor mediated intercellular crosstalk by integrating scRNAseq and subcellular proteomics, and -combinatorial drug targets (e.g., Hedgehog signaling and CAMK-II) to inhibit EMT, which we validate using a 3D mammary duct-on-a-chip platform. Overall, this study provides a resource on TGFß signaling and EMT.


Assuntos
Transição Epitelial-Mesenquimal , Proteínas Hedgehog , Transição Epitelial-Mesenquimal/genética , Proteínas Hedgehog/metabolismo , Células Epiteliais/metabolismo , Transdução de Sinais , Fator de Crescimento Transformador beta/metabolismo
10.
Biomark Res ; 11(1): 9, 2023 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-36694221

RESUMO

BACKGROUND: It has been believed that traditional handcrafted radiomic features extracted from magnetic resonance imaging (MRI) of tumors are normally shallow and low-ordered. Recent advancement in deep learning technology shows that the high-order deep radiomic features extracted automatically from tumor images can capture tumor heterogeneity in a more efficient way. We hypothesize that MRI-based deep radiomic phenotypes have significant associations with molecular profiles of breast cancer tumors. We aim to identify deep radiomic features (DRFs) from MRI, evaluate their significance in predicting breast cancer (BC) clinical characteristics and explore their associations with multi-level genomic factors. METHODS: A denoising autoencoder was built to retrospectively extract 4,096 DRFs from 110 BC patients' MRI. Visualization and clustering were applied to these DRFs. Linear Mixed Effect models were used to test their associations with multi-level genomic features (GFs) (risk genes, gene signatures, and biological pathway activities) extracted from the same patients' mRNA expression profile. A Least Absolute Shrinkage and Selection Operator model was used to identify the most predictive DRFs for each clinical characteristic (tumor size (T), lymph node metastasis (N), estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status). RESULTS: Thirty-six conventional radiomic features (CRFs) for 87 of the 110 BC patients provided by a previous study were used for comparison. More than 1,000 DRFs were associated with the risk genes, gene signatures, and biological pathways activities (adjusted P-value < 0.05). DRFs produced better performance in predicting T, N, ER, PR, and HER2 status (AUC > 0.9) using DRFs. These DRFs showed significant powers of stratifying patients, linking to relevant biological and clinical characteristics. As a contrast, only eight risk genes were associated with CRFs. The RFs performed worse in predicting clinical characteristics than DRFs. CONCLUSIONS: The deep learning-based auto MRI features perform better in predicting BC clinical characteristics, which are more significantly associated with GFs than traditional semi-auto MRI features. Our radiogenomic approach for identifying MRI-based imaging signatures may pave potential pathways for the discovery of genetic mechanisms regulating specific tumor phenotypes and may enable a more rapid innovation of novel imaging modalities, hence accelerating their translation to personalized medicine.

11.
Cancer Med ; 12(5): 6117-6128, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36281472

RESUMO

INTRODUCTION: Analyzing longitudinal cancer quality-of-life (QoL) measurements and their impact on clinical outcomes may improve our understanding of patient trajectories during systemic therapy. We applied an unsupervised growth mixture modeling (GMM) approach to identify unobserved subpopulations ("patient clusters") in the CO.20 clinical trial longitudinal QoL data. Classes were then evaluated for differences in clinico-epidemiologic characteristics and overall survival (OS). METHODS AND MATERIALS: In CO.20, 750 chemotherapy-refractory metastatic colorectal cancer (CRC) patients were randomized to receive Brivanib+Cetuximab (n = 376, experimental arm) versus Cetuximab+Placebo (n = 374, standard arm) for 16 weeks. EORTC-QLQ-C30 QoL summary scores were calculated for each patient at seven time points, and GMM was applied to identify patient clusters (termed "classes"). Log-rank/Kaplan-Meier and multivariable Cox regression analyses were conducted to analyze the survival performance between classes. Cox analyses were used to explore the relationship between baseline QoL, individual slope, and the quadratic terms from the GMM output with OS. RESULTS: In univariable analysis, the linear mixed effect model (LMM) identified sex and ECOG Performance Status as strongly associated with the longitudinal QoL score (p < 0.01). The patients within each treatment arm were clustered into three distinct QoL-based classes by GMM, respectively. The three classes identified in the experimental (log-rank p-value = 0.00058) and in the control arms (p < 0.0001) each showed significantly different survival performance. The GMM's baseline, slope, and quadratic terms were each significantly associated with OS (p < 0.001). CONCLUSION: GMM can be used to analyze longitudinal QoL data in cancer studies, by identifying unobserved subpopulations (patient clusters). As demonstrated by CO.20 data, these classes can have important implications, including clinical prognostication.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica , Qualidade de Vida , Humanos , Cetuximab/uso terapêutico , Análise por Conglomerados , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico
12.
Pharmacogenomics J ; 23(4): 61-72, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36424525

RESUMO

Our previous studies demonstrated that the FOXM1 pathway is upregulated and the PPARA pathway downregulated in breast cancer (BC), and especially in the triple negative breast cancer (TNBC) subtype. Targeting the two pathways may offer potential therapeutic strategies to treat BC, especially TNBC which has the fewest effective therapies available among all BC subtypes. In this study we identified small molecule compounds that could modulate the PPARA and FOXM1 pathways in BC using two methods. In the first method, data were initially curated from the Connectivity Map (CMAP) database, which provides the gene expression profiles of MCF7 cells treated with different compounds as well as paired controls. We then calculated the changes in the FOXM1 and PPARA pathway activities from the compound-induced gene expression profiles under each treatment to identify compounds that produced a decreased activity in the FOXM1 pathway or an increased activity in the PPARA pathway. In the second method, the CMAP database tool was used to identify compounds that could reverse the expression pattern of the two pathways in MCF7 cells. Compounds identified as repressing the FOXM1 pathway or activating the PPARA pathway by the two methods were compared. We identified 19 common compounds that could decrease the FOXM1 pathway activity scores and reverse the FOXM1 pathway expression pattern, and 13 common compounds that could increase the PPARA pathway activity scores and reverse the PPARA pathway expression pattern. It may be of interest to validate these compounds experimentally to further investigate their effects on TNBCs.


Assuntos
Neoplasias de Mama Triplo Negativas , Humanos , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/metabolismo , Linhagem Celular Tumoral , Proteína Forkhead Box M1/genética , Proteína Forkhead Box M1/metabolismo , Células MCF-7 , Biologia Computacional , PPAR alfa/genética , Regulação Neoplásica da Expressão Gênica
13.
medRxiv ; 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-38168307

RESUMO

The human subcortex is involved in memory and cognition. Structural and functional changes in subcortical regions is implicated in psychiatric conditions. We performed an association study of subcortical volumes using 15,941 tandem repeats (TRs) derived from whole exome sequencing (WES) data in 16,527 unrelated European ancestry participants. We identified 17 loci, most of which were associated with accumbens volume, and nine of which had fine-mapping probability supporting their causal effect on subcortical volume independent of surrounding variation. The most significant association involved NTN1 -[GCGG] N and increased accumbens volume (ß=5.93, P=8.16x10 -9 ). Three exonic TRs had large effects on thalamus volume ( LAT2 -[CATC] N ß=-949, P=3.84x10 -6 and SLC39A4 -[CAG] N ß=-1599, P=2.42x10 -8 ) and pallidum volume ( MCM2 -[AGG] N ß=-404.9, P=147x10 -7 ). These genetic effects were consistent measurements of per-repeat expansion/contraction effects on organism fitness. With 3-dimensional modeling, we reinforced these effects to show that the expanded and contracted LAT2 -[CATC] N repeat causes a frameshift mutation that prevents appropriate protein folding. These TRs also exhibited independent effects on several psychiatric symptoms, including LAT2 -[CATC] N and the tiredness/low energy symptom of depression (ß=0.340, P=0.003). These findings link genetic variation to tractable biology in the brain and relevant psychiatric symptoms. We also chart one pathway for TR prioritization in future complex trait genetic studies.

14.
iScience ; 25(12): 105489, 2022 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-36404915

RESUMO

Severe early childhood caries (S-ECC) is a multifactorial disease with strong evidence of genetic inheritance. Previous studies suggest that variants in taste genes are associated with dental caries due to the role of taste proteins in mediating taste preferences, oral innate immunity, and important host-microbial interactions. However, few taste genes have been investigated in caries studies. Therefore, the associations of genetic variants in sweet, bitter, umami, salt, sour, carbonation, and fat taste-related genes with S-ECC and plaque microbial composition (16S and ITS1 rRNA sequencing) were evaluated. The results showed that sixteen variants in seven taste genes (SCNN1D, CA6, TAS2R3, OTOP1, TAS2R5, TAS2R60, and TAS2R4) were associated with S-ECC. Twenty-one variants in twelve taste genes were correlated with relative abundances of bacteria or fungi. These results suggest that S-ECC risk and composition of the plaque microbiome can be partially influenced by genetic variants in genes related to taste sensation.

15.
PLoS Comput Biol ; 18(10): e1010613, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36228001

RESUMO

Screening for novel antibacterial compounds in small molecule libraries has a low success rate. We applied machine learning (ML)-based virtual screening for antibacterial activity and evaluated its predictive power by experimental validation. We first binarized 29,537 compounds according to their growth inhibitory activity (hit rate 0.87%) against the antibiotic-resistant bacterium Burkholderia cenocepacia and described their molecular features with a directed-message passing neural network (D-MPNN). Then, we used the data to train an ML model that achieved a receiver operating characteristic (ROC) score of 0.823 on the test set. Finally, we predicted antibacterial activity in virtual libraries corresponding to 1,614 compounds from the Food and Drug Administration (FDA)-approved list and 224,205 natural products. Hit rates of 26% and 12%, respectively, were obtained when we tested the top-ranked predicted compounds for growth inhibitory activity against B. cenocepacia, which represents at least a 14-fold increase from the previous hit rate. In addition, more than 51% of the predicted antibacterial natural compounds inhibited ESKAPE pathogens showing that predictions expand beyond the organism-specific dataset to a broad range of bacteria. Overall, the developed ML approach can be used for compound prioritization before screening, increasing the typical hit rate of drug discovery.


Assuntos
Descoberta de Drogas , Bibliotecas de Moléculas Pequenas , Estados Unidos , Bibliotecas de Moléculas Pequenas/farmacologia , Aprendizado de Máquina , Antibacterianos/farmacologia
16.
Genomics ; 114(5): 110474, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36057424

RESUMO

BACKGROUND: It has become increasingly important to identify molecular markers for accurately diagnosing prostate cancer (PCa) stages between localized PCa (LPC) and locally advanced PCa (LAPC). However, there is a lack of profiling both epigenome-wide DNA methylation and transcriptome for the same patients with PCa at different stages. This study aims to identify epitranscriptomic biomarkers screened in the peri-prostatic (PP) adipose tissue for predicting LPC and LAPC. METHODS: We profiled gene expression and DNA methylation of 10 PCa patients' PP adipose tissue (4 LPC and 6 LAPC). Differential analysis was used to identify differentially methylated CpG sites and expressed genes. An integrative analysis of the microarray gene expression profiles and DNA methylation profiles was conducted using LASSO (least absolute shrinkage and selection operator) between each studied gene and the CpG sites in their promoter region. This epitranscriptomic signature was constructed by combining the association and differential analyses. The signature was then refined using the genetic mutation data of >1500 primary PCa and metastasis PCa samples from 4 different studies. We determined genes that were the most significantly affected by mutations. Machine learning models were built to evaluate the classification ability of the identified signature using the gene expression profiles from three external cohorts. RESULTS: From the LASSO-based association analysis, we identified 56 genes presenting significant anti-correlation between the expression level and the methylation level of at least one CpG site in the promoter region (p-value<5 × 10-8). From the differential analysis, we detected 16,405 downregulated genes and 9485 genes containing at least one hypermethylated CpG site. We identified 30 genes that showed anti-correlation, down-regulation and hyper-methylation simultaneously. Using genetic mutation data, we determined that 6 of the 30 genes showed significant differences (adjusted p-value<0.05) in mutation frequencies between the primary PCa and metastasis PCa samples. The identified 30 genes performed well in distinguishing PCa patients with metastasis from PCa patient without metastasis (area under the receiver operating characteristic curve (AUC) = 0.81). The gene signature also performed well in distinguishing PCa patients with high risk of progression from PCa patients with low risk of progression (AUC = 0.88). CONCLUSIONS: We established an integrative framework to identify differentially expressed genes with an aberrant methylation pattern on PP adipose tissue that may represent novel candidate molecular markers for distinguishing between LPC and LAPC.


Assuntos
Metilação de DNA , Neoplasias da Próstata , Biomarcadores/metabolismo , Ilhas de CpG , Epigenoma , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Regiões Promotoras Genéticas , Neoplasias da Próstata/metabolismo , Transcriptoma
17.
BMC Bioinformatics ; 23(Suppl 7): 343, 2022 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-35974325

RESUMO

BACKGROUND: A recurring problem in image segmentation is a lack of labelled data. This problem is especially acute in the segmentation of lung computed tomography (CT) of patients with Coronavirus Disease 2019 (COVID-19). The reason for this is simple: the disease has not been prevalent long enough to generate a great number of labels. Semi-supervised learning promises a way to learn from data that is unlabelled and has seen tremendous advancements in recent years. However, due to the complexity of its label space, those advancements cannot be applied to image segmentation. That being said, it is this same complexity that makes it extremely expensive to obtain pixel-level labels, making semi-supervised learning all the more appealing. This study seeks to bridge this gap by proposing a novel model that utilizes the image segmentation abilities of deep convolution networks and the semi-supervised learning abilities of generative models for chest CT images of patients with the COVID-19. RESULTS: We propose a novel generative model called the shared variational autoencoder (SVAE). The SVAE utilizes a five-layer deep hierarchy of latent variables and deep convolutional mappings between them, resulting in a generative model that is well suited for lung CT images. Then, we add a novel component to the final layer of the SVAE which forces the model to reconstruct the input image using a segmentation that must match the ground truth segmentation whenever it is present. We name this final model StitchNet. CONCLUSION: We compare StitchNet to other image segmentation models on a high-quality dataset of CT images from COVID-19 patients. We show that our model has comparable performance to the other segmentation models. We also explore the potential limitations and advantages in our proposed algorithm and propose some potential future research directions for this challenging issue.


Assuntos
COVID-19 , Processamento de Imagem Assistida por Computador , Algoritmos , COVID-19/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina Supervisionado , Tomografia Computadorizada por Raios X
18.
Front Oncol ; 12: 879607, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35814415

RESUMO

Proper analysis of high-dimensional human genomic data is necessary to increase human knowledge about fundamental biological questions such as disease associations and drug sensitivity. However, such data contain sensitive private information about individuals and can be used to identify an individual (i.e., privacy violation) uniquely. Therefore, raw genomic datasets cannot be publicly published or shared with researchers. The recent success of deep learning (DL) in diverse problems proved its suitability for analyzing the high volume of high-dimensional genomic data. Still, DL-based models leak information about the training samples. To overcome this challenge, we can incorporate differential privacy mechanisms into the DL analysis framework as differential privacy can protect individuals' privacy. We proposed a differential privacy based DL framework to solve two biological problems: breast cancer status (BCS) and cancer type (CT) classification, and drug sensitivity prediction. To predict BCS and CT using genomic data, we built a differential private (DP) deep autoencoder (dpAE) using private gene expression datasets that performs low-dimensional data representation learning. We used dpAE features to build multiple DP binary classifiers to predict BCS and CT in any individual. To predict drug sensitivity, we used the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. We extracted GDSC's dpAE features to build our DP drug sensitivity prediction model for 265 drugs. Evaluation of our proposed DP framework shows that it achieves improved prediction performance in predicting BCS, CT, and drug sensitivity than the previously published DP work.

19.
Comput Struct Biotechnol J ; 20: 2484-2494, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35664228

RESUMO

In precise medicine, it is with great value to develop computational frameworks for identifying prognostic biomarkers which can capture both multi-genomic and phenotypic heterogeneity of breast cancer (BC). Radiogenomics is a field where medical images and genomic measurements are integrated and mined to solve challenging clinical problems. Previous radiogenomic studies suffered from data incompleteness, feature subjectivity and low interpretability. For example, the majority of the radiogenomic studies miss one or two of medical imaging data, genomic data, and clinical outcome data, which results in the data incomplete issue. Feature subjectivity issue comes from the extraction of imaging features with significant human involvement. Thus, there is an urgent need to address above-mentioned limitations so that fully automatic and transparent radiogenomic prognostic biomarkers could be identified for BC. We proposed a novel framework for BC prognostic radiogenomic biomarker identification. This framework involves an explainable DL model for image feature extraction, a Bayesian tensor factorization (BTF) processing for multi-genomic feature extraction, a leverage strategy to utilize unpaired imaging, genomic, and survival outcome data, and a mediation analysis to provide further interpretation for identified biomarkers. This work provided a new perspective for conducting a comprehensive radiogenomic study when only limited resources are given. Compared with baseline traditional radiogenomic biomarkers, the 23 biomarkers identified by the proposed framework performed better in indicating patients' survival outcome. And their interpretability is guaranteed by different levels of build-in and follow-up analyses.

20.
Comput Methods Programs Biomed ; 221: 106903, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35636358

RESUMO

BACKGROUND AND OBJECTIVE: Both mass detection and segmentation in digital mammograms play a crucial role in early breast cancer detection and treatment. Furthermore, clinical experience has shown that they are the upstream tasks of pathological classification of breast lesions. Recent advancements in deep learning have made the analyses faster and more accurate. This study aims to develop a deep learning model architecture for breast cancer mass detection and segmentation using the mammography. METHODS: In this work we proposed a double shot model for mass detection and segmentation simultaneously using a combination of YOLO (You Only Look Once) and LOGO (Local-Global) architectures. Firstly, we adopted YoloV5L6, the state-of-the-art object detection model, to position and crop the breast mass in mammograms with a high resolution; Secondly, to balance training efficiency and segmentation performance, we modified the LOGO training strategy to train the whole images and cropped images on the global and local transformer branches separately. The two branches were then merged to form the final segmentation decision. RESULTS: The proposed YOLO-LOGO model was tested on two independent mammography datasets (CBIS-DDSM and INBreast). The proposed model performs significantly better than previous works. It achieves true positive rate 95.7% and mean average precision 65.0% for mass detection on CBIS-DDSM dataset. Its performance for mass segmentation on CBIS-DDSM dataset is F1-score=74.5% and IoU=64.0%. The similar performance trend is observed in another independent dataset INBreast as well. CONCLUSIONS: The proposed model has a higher efficiency and better performance, reduces computational requirements, and improves the versatility and accuracy of computer-aided breast cancer diagnosis. Hence it has the potential to enable more assistance for doctors in early breast cancer detection and treatment, thereby reducing mortality.


Assuntos
Neoplasias da Mama , Redes Neurais de Computação , Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico por imagem , Diagnóstico por Computador , Feminino , Humanos , Mamografia/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...