Pesquisa | BVS Aleitamento Materno

1.

A census of human soluble protein complexes.

Havugimana, Pierre C; Hart, G Traver; Nepusz, Tamás; Yang, Haixuan; Turinsky, Andrei L; Li, Zhihua; Wang, Peggy I; Boutz, Daniel R; Fong, Vincent; Phanse, Sadhna; Babu, Mohan; Craig, Stephanie A; Hu, Pingzhao; Wan, Cuihong; Vlasblom, James; Dar, Vaqaar-un-Nisa; Bezginov, Alexandr; Clark, Gregory W; Wu, Gabriel C; Wodak, Shoshana J; Tillier, Elisabeth R M; Paccanaro, Alberto; Marcotte, Edward M; Emili, Andrew.

Cell ; 150(5): 1068-81, 2012 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-22939629

RESUMO

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.

Assuntos

Complexos Multiproteicos/análise , Mapas de Interação de Proteínas , Proteínas/química , Proteômica/métodos , Humanos , Espectrometria de Massas em Tandem

2.

iNGNN-DTI: prediction of drug-target interaction with interpretable nested graph neural network and pretrained molecule models.

Sun, Yan; Li, Yan Yi; Leung, Carson K; Hu, Pingzhao.

Bioinformatics ; 40(3)2024 03 04.

Artigo em Inglês | MEDLINE | ID: mdl-38449285

RESUMO

MOTIVATION: Drug-target interaction (DTI) prediction aims to identify interactions between drugs and protein targets. Deep learning can automatically learn discriminative features from drug and protein target representations for DTI prediction, but challenges remain, making it an open question. Existing approaches encode drugs and targets into features using deep learning models, but they often lack explanations for underlying interactions. Moreover, limited labeled DTIs in the chemical space can hinder model generalization. RESULTS: We propose an interpretable nested graph neural network for DTI prediction (iNGNN-DTI) using pre-trained molecule and protein models. The analysis is conducted on graph data representing drugs and targets by using a specific type of nested graph neural network, in which the target graphs are created based on 3D structures using Alphafold2. This architecture is highly expressive in capturing substructures of the graph data. We use a cross-attention module to capture interaction information between the substructures of drugs and targets. To improve feature representations, we integrate features learned by models that are pre-trained on large unlabeled small molecule and protein datasets, respectively. We evaluate our model on three benchmark datasets, and it shows a consistent improvement on all baseline models in all datasets. We also run an experiment with previously unseen drugs or targets in the test set, and our model outperforms all of the baselines. Furthermore, the iNGNN-DTI can provide more insights into the interaction by visualizing the weights learned by the cross-attention module. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithm is available at https://github.com/syan1992/iNGNN-DTI.

Assuntos

Algoritmos , Redes Neurais de Computação , Interações Medicamentosas , Benchmarking , Sistemas de Liberação de Medicamentos

3.

ST-CellSeg: Cell segmentation for imaging-based spatial transcriptomics using multi-scale manifold learning.

Li, Youcheng; Lac, Leann; Liu, Qian; Hu, Pingzhao.

PLoS Comput Biol ; 20(6): e1012254, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38935799

RESUMO

Spatial transcriptomics has gained popularity over the past decade due to its ability to evaluate transcriptome data while preserving spatial information. Cell segmentation is a crucial step in spatial transcriptomic analysis, as it enables the avoidance of unpredictable tissue disentanglement steps. Although high-quality cell segmentation algorithms can aid in the extraction of valuable data, traditional methods are frequently non-spatial, do not account for spatial information efficiently, and perform poorly when confronted with the problem of spatial transcriptome cell segmentation with varying shapes. In this study, we propose ST-CellSeg, an image-based machine learning method for spatial transcriptomics that uses manifold for cell segmentation and is novel in its consideration of multi-scale information. We first construct a fully connected graph which acts as a spatial transcriptomic manifold. Using multi-scale data, we then determine the low-dimensional spatial probability distribution representation for cell segmentation. Using the adjusted Rand index (ARI), normalized mutual information (NMI), and Silhouette coefficient (SC) as model performance measures, the proposed algorithm significantly outperforms baseline models in selected datasets and is efficient in computational complexity.

Assuntos

Algoritmos , Biologia Computacional , Perfilação da Expressão Gênica , Aprendizado de Máquina , Transcriptoma , Biologia Computacional/métodos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Humanos , Processamento de Imagem Assistida por Computador/métodos

4.

Conditional probabilistic diffusion model driven synthetic radiogenomic applications in breast cancer.

Chen, Lianghong; Huang, Zi Huai; Sun, Yan; Domaratzki, Mike; Liu, Qian; Hu, Pingzhao.

PLoS Comput Biol ; 20(10): e1012490, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39374308

RESUMO

This study addresses the heterogeneity of Breast Cancer (BC) by employing a Conditional Probabilistic Diffusion Model (CPDM) to synthesize Magnetic Resonance Images (MRIs) based on multi-omic data, including gene expression, copy number variation, and DNA methylation. The lack of paired medical images and genomics data in previous studies presented a challenge, which the CPDM aims to overcome. The well-trained CPDM successfully generated synthetic MRIs for 726 TCGA-BRCA patients, who lacked actual MRIs, using their multi-omic profiles. Evaluation metrics such as Frechet's Inception Distance (FID), Mean Square Error (MSE), and Structural Similarity Index Measure (SSIM) demonstrated the CPDM's effectiveness, with an FID of 2.02, an MSE of 0.02, and an SSIM of 0.59 based on the 15-fold cross-validation. The synthetic MRIs were used to predict clinical attributes, achieving an Area Under the Receiver-Operating-Characteristic curve (AUROC) of 0.82 and an Area Under the Precision-Recall Curve (AUPRC) of 0.84 for predicting ER+/HER2+ subtypes. Additionally, the MRIs served to accurately predicted BC patient survival with a Concordance-index (C-index) score of 0.88, outperforming other baseline models. This research demonstrates the potential of CPDMs in generating MRIs based on BC patients' genomic profiles, offering valuable insights for radiogenomic research and advancements in precision medicine. The study provides a novel approach to understanding BC heterogeneity for early detection and personalized treatment.

Assuntos

Neoplasias da Mama , Imageamento por Ressonância Magnética , Modelos Estatísticos , Humanos , Neoplasias da Mama/genética , Neoplasias da Mama/diagnóstico por imagem , Feminino , Imageamento por Ressonância Magnética/métodos , Variações do Número de Cópias de DNA/genética , Genômica/métodos , Biologia Computacional/métodos , Metilação de DNA/genética

5.

Conditional generative adversarial network driven radiomic prediction of mutation status based on magnetic resonance imaging of breast cancer.

Huang, Zi Huai; Chen, Lianghong; Sun, Yan; Liu, Qian; Hu, Pingzhao.

J Transl Med ; 22(1): 226, 2024 03 02.

Artigo em Inglês | MEDLINE | ID: mdl-38429796

RESUMO

BACKGROUND: Breast Cancer (BC) is a highly heterogeneous and complex disease. Personalized treatment options require the integration of multi-omic data and consideration of phenotypic variability. Radiogenomics aims to merge medical images with genomic measurements but encounter challenges due to unpaired data consisting of imaging, genomic, or clinical outcome data. In this study, we propose the utilization of a well-trained conditional generative adversarial network (cGAN) to address the unpaired data issue in radiogenomic analysis of BC. The generated images will then be used to predict the mutations status of key driver genes and BC subtypes. METHODS: We integrated the paired MRI and multi-omic (mRNA gene expression, DNA methylation, and copy number variation) profiles of 61 BC patients from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). To facilitate this integration, we employed a Bayesian Tensor Factorization approach to factorize the multi-omic data into 17 latent features. Subsequently, a cGAN model was trained based on the matched side-view patient MRIs and their corresponding latent features to predict MRIs for BC patients who lack MRIs. Model performance was evaluated by calculating the distance between real and generated images using the Fréchet Inception Distance (FID) metric. BC subtype and mutation status of driver genes were obtained from the cBioPortal platform, where 3 genes were selected based on the number of mutated patients. A convolutional neural network (CNN) was constructed and trained using the generated MRIs for mutation status prediction. Receiver operating characteristic area under curve (ROC-AUC) and precision-recall area under curve (PR-AUC) were used to evaluate the performance of the CNN models for mutation status prediction. Precision, recall and F1 score were used to evaluate the performance of the CNN model in subtype classification. RESULTS: The FID of the images from the well-trained cGAN model based on the test set is 1.31. The CNN for TP53, PIK3CA, and CDH1 mutation prediction yielded ROC-AUC values 0.9508, 0.7515, and 0.8136 and PR-AUC are 0.9009, 0.7184, and 0.5007, respectively for the three genes. Multi-class subtype prediction achieved precision, recall and F1 scores of 0.8444, 0.8435 and 0.8336 respectively. The source code and related data implemented the algorithms can be found in the project GitHub at https://github.com/mattthuang/BC_RadiogenomicGAN . CONCLUSION: Our study establishes cGAN as a viable tool for generating synthetic BC MRIs for mutation status prediction and subtype classification to better characterize the heterogeneity of BC in patients. The synthetic images also have the potential to significantly augment existing MRI data and circumvent issues surrounding data sharing and patient privacy for future BC machine learning studies.

Assuntos

Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética , Radiômica , Variações do Número de Cópias de DNA , Teorema de Bayes , Imageamento por Ressonância Magnética/métodos , Mutação/genética

6.

Computational frameworks integrating deep learning and statistical models in mining multimodal omics data.

Lac, Leann; Leung, Carson K; Hu, Pingzhao.

J Biomed Inform ; 152: 104629, 2024 04.

Artigo em Inglês | MEDLINE | ID: mdl-38552994

RESUMO

BACKGROUND: In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS: The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.

Assuntos

Aprendizado Profundo , Genômica , Genômica/métodos , Biologia Computacional/métodos , Algoritmos , Modelos Estatísticos

7.

Multiomic Metabolic Enrichment Network Analysis Reveals Metabolite-Protein Physical Interaction Subnetworks Altered in Cancer.

Blum, Benjamin C; Lin, Weiwei; Lawton, Matthew L; Liu, Qian; Kwan, Julian; Turcinovic, Isabella; Hekman, Ryan; Hu, Pingzhao; Emili, Andrew.

Mol Cell Proteomics ; 21(1): 100189, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34933084

RESUMO

Metabolism is recognized as an important driver of cancer progression and other complex diseases, but global metabolite profiling remains a challenge. Protein expression profiling is often a poor proxy since existing pathway enrichment models provide an incomplete mapping between the proteome and metabolism. To overcome these gaps, we introduce multiomic metabolic enrichment network analysis (MOMENTA), an integrative multiomic data analysis framework for more accurately deducing metabolic pathway changes from proteomics data alone in a gene set analysis context by leveraging protein interaction networks to extend annotated metabolic models. We apply MOMENTA to proteomic data from diverse cancer cell lines and human tumors to demonstrate its utility at revealing variation in metabolic pathway activity across cancer types, which we verify using independent metabolomics measurements. The novel metabolic networks we uncover in breast cancer and other tumors are linked to clinical outcomes, underscoring the pathophysiological relevance of the findings.

Assuntos

Neoplasias da Mama , Proteômica , Neoplasias da Mama/metabolismo , Feminino , Humanos , Redes e Vias Metabólicas , Metabolômica , Mapas de Interação de Proteínas

8.

Tightly integrated multiomics-based deep tensor survival model for time-to-event prediction.

Zhang, Jasper Zhongyuan; Xu, Wei; Hu, Pingzhao.

Bioinformatics ; 38(12): 3259-3266, 2022 06 13.

Artigo em Inglês | MEDLINE | ID: mdl-35445698

RESUMO

MOTIVATION: Multiomics cancer profiles provide essential signals for predicting cancer survival. It is challenging to reveal the complex patterns from multiple types of data and link them to survival outcomes. We aim to develop a new deep learning-based algorithm to integrate three types of high-dimensional omics data measured on the same individuals to improve cancer survival outcome prediction. RESULTS: We built a three-dimension tensor to integrate multi-omics cancer data and factorized it into two-dimension matrices of latent factors, which were fed into neural networks-based survival networks. The new algorithm and other multi-omics-based algorithms, as well as individual genomic-based survival analysis algorithms, were applied to the breast cancer data colon and rectal cancer data from The Cancer Genome Atlas (TCGA) program. We evaluated the goodness-of-fit using the concordance index (C-index) and Integrated Brier Score (IBS). We demonstrated that the proposed tight integration framework has better survival prediction performance than the models using individual genomic data and other conventional data integration methods. AVAILABILITY AND IMPLEMENTATION: https://github.com/jasperzyzhang/DeepTensorSurvival. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Neoplasias da Mama , Genômica , Humanos , Feminino , Genômica/métodos , Algoritmos , Genoma , Redes Neurais de Computação , Neoplasias da Mama/genética

9.

Bioinformatics driven discovery of small molecule compounds that modulate the FOXM1 and PPARA pathway activities in breast cancer.

Huang, Shujun; Hu, Pingzhao; Lakowski, Ted M.

Pharmacogenomics J ; 23(4): 61-72, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-36424525

RESUMO

Our previous studies demonstrated that the FOXM1 pathway is upregulated and the PPARA pathway downregulated in breast cancer (BC), and especially in the triple negative breast cancer (TNBC) subtype. Targeting the two pathways may offer potential therapeutic strategies to treat BC, especially TNBC which has the fewest effective therapies available among all BC subtypes. In this study we identified small molecule compounds that could modulate the PPARA and FOXM1 pathways in BC using two methods. In the first method, data were initially curated from the Connectivity Map (CMAP) database, which provides the gene expression profiles of MCF7 cells treated with different compounds as well as paired controls. We then calculated the changes in the FOXM1 and PPARA pathway activities from the compound-induced gene expression profiles under each treatment to identify compounds that produced a decreased activity in the FOXM1 pathway or an increased activity in the PPARA pathway. In the second method, the CMAP database tool was used to identify compounds that could reverse the expression pattern of the two pathways in MCF7 cells. Compounds identified as repressing the FOXM1 pathway or activating the PPARA pathway by the two methods were compared. We identified 19 common compounds that could decrease the FOXM1 pathway activity scores and reverse the FOXM1 pathway expression pattern, and 13 common compounds that could increase the PPARA pathway activity scores and reverse the PPARA pathway expression pattern. It may be of interest to validate these compounds experimentally to further investigate their effects on TNBCs.

Assuntos

Neoplasias de Mama Triplo Negativas , Humanos , Neoplasias de Mama Triplo Negativas/tratamento farmacológico , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/metabolismo , Linhagem Celular Tumoral , Proteína Forkhead Box M1/genética , Proteína Forkhead Box M1/metabolismo , Células MCF-7 , Biologia Computacional , PPAR alfa/genética , Regulação Neoplásica da Expressão Gênica

10.

A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery.

Rahman, A S M Zisanur; Liu, Chengyou; Sturm, Hunter; Hogan, Andrew M; Davis, Rebecca; Hu, Pingzhao; Cardona, Silvia T.

PLoS Comput Biol ; 18(10): e1010613, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36228001

RESUMO

Screening for novel antibacterial compounds in small molecule libraries has a low success rate. We applied machine learning (ML)-based virtual screening for antibacterial activity and evaluated its predictive power by experimental validation. We first binarized 29,537 compounds according to their growth inhibitory activity (hit rate 0.87%) against the antibiotic-resistant bacterium Burkholderia cenocepacia and described their molecular features with a directed-message passing neural network (D-MPNN). Then, we used the data to train an ML model that achieved a receiver operating characteristic (ROC) score of 0.823 on the test set. Finally, we predicted antibacterial activity in virtual libraries corresponding to 1,614 compounds from the Food and Drug Administration (FDA)-approved list and 224,205 natural products. Hit rates of 26% and 12%, respectively, were obtained when we tested the top-ranked predicted compounds for growth inhibitory activity against B. cenocepacia, which represents at least a 14-fold increase from the previous hit rate. In addition, more than 51% of the predicted antibacterial natural compounds inhibited ESKAPE pathogens showing that predictions expand beyond the organism-specific dataset to a broad range of bacteria. Overall, the developed ML approach can be used for compound prioritization before screening, increasing the typical hit rate of drug discovery.

Assuntos

Descoberta de Drogas , Bibliotecas de Moléculas Pequenas , Estados Unidos , Bibliotecas de Moléculas Pequenas/farmacologia , Aprendizado de Máquina , Antibacterianos/farmacologia

11.

Developing a prediction model of children asthma risk using population-based family history health records.

Hamad, Amani F; Yan, Lin; Jafari Jozani, Mohammad; Hu, Pingzhao; Delaney, Joseph A; Lix, Lisa M.

Pediatr Allergy Immunol ; 34(10): e14032, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37877849

RESUMO

BACKGROUND: Identifying children at high risk of developing asthma can facilitate prevention and early management strategies. We developed a prediction model of children's asthma risk using objectively collected population-based children and parental histories of comorbidities. METHODS: We conducted a retrospective population-based cohort study using administrative data from Manitoba, Canada, and included children born from 1974 to 2000 with linkages to ≥1 parent. We identified asthma and prior comorbid condition diagnoses from hospital and outpatient records. We used two machine-learning models: least absolute shrinkage and selection operator (LASSO) logistic regression (LR) and random forest (RF) to identify important predictors. The predictors in the base model included children's demographics, allergic conditions, respiratory infections, and parental asthma. Subsequent models included additional multiple comorbidities for children and parents. RESULTS: The cohort included 195,666 children: 51.3% were males and 17.7% had asthma diagnosis. The base LR model achieved a low predictive performance with sensitivity of 0.47, 95% confidence interval (0.45-0.48), and specificity of 0.67 (0.66-0.67) using a predicted probability threshold of 0.20. Sensitivity significantly improved when children's comorbidities were included using LASSO LR: 0.71 (0.69-0.72). Predictive performance further improved by including parental comorbidities (sensitivity = 0.72 [0.70-0.73], specificity = 0.69 [0.69-0.70]). We observed similar results for the RF models. Children's menstrual disorders and mood and anxiety disorders, parental lipid metabolism disorders and asthma were among the most important variables that predicted asthma risk. CONCLUSION: Including children and parental comorbidities to children's asthma prediction models improves their accuracy.

Assuntos

Asma , Masculino , Feminino , Humanos , Criança , Estudos de Coortes , Estudos Retrospectivos , Asma/diagnóstico , Asma/epidemiologia , Transtornos de Ansiedade , Canadá

12.

Epigenome-wide DNA methylation and transcriptome profiling of localized and locally advanced prostate cancer: Uncovering new molecular markers.

Liu, Qian; Reed, Madison; Zhu, Haiying; Cheng, Yan; Almeida, Joana; Fruhbeck, Gema; Ribeiro, Ricardo; Hu, Pingzhao.

Genomics ; 114(5): 110474, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-36057424

RESUMO

BACKGROUND: It has become increasingly important to identify molecular markers for accurately diagnosing prostate cancer (PCa) stages between localized PCa (LPC) and locally advanced PCa (LAPC). However, there is a lack of profiling both epigenome-wide DNA methylation and transcriptome for the same patients with PCa at different stages. This study aims to identify epitranscriptomic biomarkers screened in the peri-prostatic (PP) adipose tissue for predicting LPC and LAPC. METHODS: We profiled gene expression and DNA methylation of 10 PCa patients' PP adipose tissue (4 LPC and 6 LAPC). Differential analysis was used to identify differentially methylated CpG sites and expressed genes. An integrative analysis of the microarray gene expression profiles and DNA methylation profiles was conducted using LASSO (least absolute shrinkage and selection operator) between each studied gene and the CpG sites in their promoter region. This epitranscriptomic signature was constructed by combining the association and differential analyses. The signature was then refined using the genetic mutation data of >1500 primary PCa and metastasis PCa samples from 4 different studies. We determined genes that were the most significantly affected by mutations. Machine learning models were built to evaluate the classification ability of the identified signature using the gene expression profiles from three external cohorts. RESULTS: From the LASSO-based association analysis, we identified 56 genes presenting significant anti-correlation between the expression level and the methylation level of at least one CpG site in the promoter region (p-value<5 × 10-8). From the differential analysis, we detected 16,405 downregulated genes and 9485 genes containing at least one hypermethylated CpG site. We identified 30 genes that showed anti-correlation, down-regulation and hyper-methylation simultaneously. Using genetic mutation data, we determined that 6 of the 30 genes showed significant differences (adjusted p-value<0.05) in mutation frequencies between the primary PCa and metastasis PCa samples. The identified 30 genes performed well in distinguishing PCa patients with metastasis from PCa patient without metastasis (area under the receiver operating characteristic curve (AUC) = 0.81). The gene signature also performed well in distinguishing PCa patients with high risk of progression from PCa patients with low risk of progression (AUC = 0.88). CONCLUSIONS: We established an integrative framework to identify differentially expressed genes with an aberrant methylation pattern on PP adipose tissue that may represent novel candidate molecular markers for distinguishing between LPC and LAPC.

Assuntos

Metilação de DNA , Neoplasias da Próstata , Biomarcadores/metabolismo , Ilhas de CpG , Epigenoma , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Regiões Promotoras Genéticas , Neoplasias da Próstata/metabolismo , Transcriptoma

13.

Deep clustering of small molecules at large-scale via variational autoencoder embedding and K-means.

Hadipour, Hamid; Liu, Chengyou; Davis, Rebecca; Cardona, Silvia T; Hu, Pingzhao.

BMC Bioinformatics ; 23(Suppl 4): 132, 2022 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-35428173

RESUMO

BACKGROUND: Converting molecules into computer-interpretable features with rich molecular information is a core problem of data-driven machine learning applications in chemical and drug-related tasks. Generally speaking, there are global and local features to represent a given molecule. As most algorithms have been developed based on one type of feature, a remaining bottleneck is to combine both feature sets for advanced molecule-based machine learning analysis. Here, we explored a novel analytical framework to make embeddings of the molecular features and apply them in the clustering of a large number of small molecules. RESULTS: In this novel framework, we first introduced a principal component analysis method encoding the molecule-specific atom and bond information. We then used a variational autoencoder (AE)-based method to make embeddings of the global chemical properties and the local atom and bond features. Next, using the embeddings from the encoded local and global features, we implemented and compared several unsupervised clustering algorithms to group the molecule-specific embeddings. The number of clusters was treated as a hyper-parameter and determined by the Silhouette method. Finally, we evaluated the corresponding results using three internal indices. Applying the analysis framework to a large chemical library of more than 47,000 molecules, we successfully identified 50 molecular clusters using the K-means method with 32 embeddings based on the AE method. We visualized the clustering result via t-SNE for the overall distribution of molecules and the similarity maps for the structural analysis of randomly selected cluster-specific molecules. CONCLUSIONS: This study developed a novel analytical framework that comprises a feature engineering scheme for molecule-specific atomic and bonding features and a deep learning-based embedding strategy for different molecular features. By applying the identified embeddings, we show their usefulness for clustering a large molecule dataset. Our novel analytic algorithms can be applied to any virtual library of chemical compounds with diverse molecular structures. Hence, these tools have the potential of optimizing drug discovery, as they can decrease the number of compounds to be screened in any drug screening campaign.

Assuntos

Algoritmos , Análise por Conglomerados , Análise de Componente Principal

14.

Semi-supervised COVID-19 CT image segmentation using deep generative models.

Zammit, Judah; Fung, Daryl L X; Liu, Qian; Leung, Carson Kai-Sang; Hu, Pingzhao.

BMC Bioinformatics ; 23(Suppl 7): 343, 2022 Aug 17.

Artigo em Inglês | MEDLINE | ID: mdl-35974325

RESUMO

BACKGROUND: A recurring problem in image segmentation is a lack of labelled data. This problem is especially acute in the segmentation of lung computed tomography (CT) of patients with Coronavirus Disease 2019 (COVID-19). The reason for this is simple: the disease has not been prevalent long enough to generate a great number of labels. Semi-supervised learning promises a way to learn from data that is unlabelled and has seen tremendous advancements in recent years. However, due to the complexity of its label space, those advancements cannot be applied to image segmentation. That being said, it is this same complexity that makes it extremely expensive to obtain pixel-level labels, making semi-supervised learning all the more appealing. This study seeks to bridge this gap by proposing a novel model that utilizes the image segmentation abilities of deep convolution networks and the semi-supervised learning abilities of generative models for chest CT images of patients with the COVID-19. RESULTS: We propose a novel generative model called the shared variational autoencoder (SVAE). The SVAE utilizes a five-layer deep hierarchy of latent variables and deep convolutional mappings between them, resulting in a generative model that is well suited for lung CT images. Then, we add a novel component to the final layer of the SVAE which forces the model to reconstruct the input image using a segmentation that must match the ground truth segmentation whenever it is present. We name this final model StitchNet. CONCLUSION: We compare StitchNet to other image segmentation models on a high-quality dataset of CT images from COVID-19 patients. We show that our model has comparable performance to the other segmentation models. We also explore the potential limitations and advantages in our proposed algorithm and propose some potential future research directions for this challenging issue.

Assuntos

COVID-19 , Processamento de Imagem Assistida por Computador , Algoritmos , COVID-19/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina Supervisionado , Tomografia Computadorizada por Raios X

15.

A pathogenic deletion in Forkhead Box L1 (FOXL1) identifies the first otosclerosis (OTSC) gene.

Abdelfatah, Nelly; Mostafa, Ahmed A; French, Curtis R; Doucette, Lance P; Penney, Cindy; Lucas, Matthew B; Griffin, Anne; Booth, Valerie; Rowley, Christopher; Besaw, Jessica E; Tranebjærg, Lisbeth; Rendtorff, Nanna Dahl; Hodgkinson, Kathy A; Little, Leichelle A; Agrawal, Sumit; Parnes, Lorne; Batten, Tony; Moore, Susan; Hu, Pingzhao; Pater, Justin A; Houston, Jim; Galutira, Dante; Benteau, Tammy; MacDonald, Courtney; French, Danielle; O'Rielly, Darren D; Stanton, Susan G; Young, Terry-Lynn.

Hum Genet ; 141(3-4): 965-979, 2022 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-34633540

RESUMO

Otosclerosis is a bone disorder of the otic capsule and common form of late-onset hearing impairment. Considered a complex disease, little is known about its pathogenesis. Over the past 20 years, ten autosomal dominant loci (OTSC1-10) have been mapped but no genes identified. Herein, we map a new OTSC locus to a 9.96 Mb region within the FOX gene cluster on 16q24.1 and identify a 15 bp coding deletion in Forkhead Box L1 co-segregating with otosclerosis in a Caucasian family. Pre-operative phenotype ranges from moderate to severe hearing loss to profound sensorineural loss requiring a cochlear implant. Mutant FOXL1 is both transcribed and translated and correctly locates to the cell nucleus. However, the deletion of 5 residues in the C-terminus of mutant FOXL1 causes a complete loss of transcriptional activity due to loss of secondary (alpha helix) structure. FOXL1 (rs764026385) was identified in a second unrelated case on a shared background. We conclude that FOXL1 (rs764026385) is pathogenic and causes autosomal dominant otosclerosis and propose a key inhibitory role for wildtype Foxl1 in bone remodelling in the otic capsule. New insights into the molecular pathology of otosclerosis from this study provide molecular targets for non-invasive therapeutic interventions.

Assuntos

Otosclerose , Fatores de Transcrição Forkhead/genética , Humanos , Otosclerose/genética

16.

Mesenchymal stromal (stem) cell therapy modulates miR-193b-5p expression to attenuate sepsis-induced acute lung injury.

Dos Santos, Claudia C; Amatullah, Hajera; Vaswani, Chirag M; Maron-Gutierrez, Tatiana; Kim, Michael; Mei, Shirley H J; Szaszi, Katalin; Monteiro, Ana Paula T; Varkouhi, Amir K; Herreroz, Raquel; Lorente, Jose Angel; Tsoporis, James N; Gupta, Sahil; Ektesabi, Amin; Kavantzas, Nikolaos; Salpeas, Vasileios; Marshall, John C; Rocco, Patricia R M; Marsden, Philip A; Weiss, Daniel J; Stewart, Duncan J; Hu, Pingzhao; Liles, W Conrad.

Eur Respir J ; 59(1)2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34112731

RESUMO

Although mesenchymal stromal (stem) cell (MSC) administration attenuates sepsis-induced lung injury in pre-clinical models, the mechanism(s) of action and host immune system contributions to its therapeutic effects remain elusive. We show that treatment with MSCs decreased expression of host-derived microRNA (miR)-193b-5p and increased expression of its target gene, the tight junctional protein occludin (Ocln), in lungs from septic mice. Mutating the Ocln 3' untranslated region miR-193b-5p binding sequence impaired binding to Ocln mRNA. Inhibition of miR-193b-5p in human primary pulmonary microvascular endothelial cells prevents tumour necrosis factor (TNF)-induced decrease in Ocln gene and protein expression and loss of barrier function. MSC-conditioned media mitigated TNF-induced miR-193b-5p upregulation and Ocln downregulation in vitro When administered in vivo, MSC-conditioned media recapitulated the effects of MSC administration on pulmonary miR-193b-5p and Ocln expression. MiR-193b-deficient mice were resistant to pulmonary inflammation and injury induced by lipopolysaccharide (LPS) instillation. Silencing of Ocln in miR-193b-deficient mice partially recovered the susceptibility to LPS-induced lung injury. In vivo inhibition of miR-193b-5p protected mice from endotoxin-induced lung injury. Finally, the clinical significance of these results was supported by the finding of increased miR-193b-5p expression levels in lung autopsy samples from acute respiratory distress syndrome patients who died with diffuse alveolar damage.

Assuntos

Lesão Pulmonar Aguda , MicroRNAs , Sepse , Lesão Pulmonar Aguda/terapia , Animais , Terapia Baseada em Transplante de Células e Tecidos , Células Endoteliais , Humanos , Camundongos , MicroRNAs/genética , Sepse/complicações , Sepse/terapia

17.

Polygenic risk and causal inference of psychiatric comorbidity in inflammatory bowel disease among patients with European ancestry.

Li, Yao; Bernstein, Charles N; Xu, Wei; Hu, Pingzhao.

J Transl Med ; 20(1): 43, 2022 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-35086532

RESUMO

BACKGROUND: Approximately 40% of persons with inflammatory bowel disease (IBD) experience psychiatric comorbidities (PC). Previous studies demonstrated the polygenetic effect on both IBD and PC. In this study, we evaluated the contribution of genetic variants to PC among the IBD population. Additionally, we evaluated whether this effect is mediated by the expression level of the RBPMS gene, which was identified in our previous studies as a potential risk factor of PC in persons with IBD. MATERIALS AND METHODS: The polygenic risk score (PRS) was estimated among persons with IBD of European ancestry (n = 240) from the Manitoba IBD Cohort Study by using external genome-wide association studies (GWAS). The association and prediction performance were examined between the estimated PRS and PC status among persons with IBD. Finally, regression-based models were applied to explore whether the imputed expression level of the RBPMS gene is a mediator between estimated PRS and PC status in IBD. RESULTS: The estimated PRS had a significantly positive association with PC status (for the highest effect: P-value threshold = 5 × 10-3, odds ratio = 2.0, P-value = 1.5 × 10-5). Around 13% of the causal effect between the PRS and PC status in IBD was mediated by the expression level of the RBPMS gene. The area under the curve of the PRS-based PC prediction model is around 0.7 at the threshold of 5 × 10-4. CONCLUSION: PC status in IBD depends on genetic influences among persons with European ancestry. The PRS could potentially be applied to PC risk screening to identify persons with IBD at a high risk of PC. Around 13% of this genetic influence could be explained by the expression level of the RBPMS gene.

Assuntos

Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais , Estudos de Coortes , Comorbidade , Predisposição Genética para Doença , Humanos , Doenças Inflamatórias Intestinais/complicações , Doenças Inflamatórias Intestinais/genética , Herança Multifatorial/genética

18.

Bayesian tensor factorization-drive breast cancer subtyping by integrating multi-omics data.

Liu, Qian; Cheng, Bowen; Jin, Yongwon; Hu, Pingzhao.

J Biomed Inform ; 125: 103958, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34839017

RESUMO

Breast cancer is a highly heterogeneous disease. Subtyping the disease and identifying the genomic features driving these subtypes are critical for precision oncology for breast cancer. This study focuses on developing a new computational approach for breast cancer subtyping. We proposed to use Bayesian tensor factorization (BTF) to integrate multi-omics data of breast cancer, which include expression profiles of RNA-sequencing, copy number variation, and DNA methylation measured on 762 breast cancer patients from The Cancer Genome Atlas. We applied a consensus clustering approach to identify breast cancer subtypes using the factorized latent features by BTF. Subtype-specific survival patterns of the breast cancer patients were evaluated using Kaplan-Meier (KM) estimators. The proposed approach was compared with other state-of-the-art approaches for cancer subtyping. The BTF-subtyping analysis identified 17 optimized latent components, which were used to reveal six major breast cancer subtypes. Out of all different approaches, only the proposed approach showed distinct survival patterns (p < 0.05). Statistical tests also showed that the identified clusters have statistically significant distributions. Our results showed that the proposed approach is a promising strategy to efficiently use publicly available multi-omics data to identify breast cancer subtypes.

Assuntos

Neoplasias da Mama , Teorema de Bayes , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA , Feminino , Genômica , Humanos , Medicina de Precisão

19.

Transitions between versions of the International Classification of Diseases and chronic disease prevalence estimates from administrative health data: a population-based study.

Sanusi, Ridwan A; Yan, Lin; Hamad, Amani F; Ayilara, Olawale F; Vasylkiv, Viktoriya; Jozani, Mohammad Jafari; Banerji, Shantanu; Delaney, Joseph; Hu, Pingzhao; Wall-Wieler, Elizabeth; Lix, Lisa M.

BMC Public Health ; 22(1): 701, 2022 04 09.

Artigo em Inglês | MEDLINE | ID: mdl-35397596

RESUMO

BACKGROUND: Diagnosis codes in administrative health data are routinely used to monitor trends in disease prevalence and incidence. The International Classification of Diseases (ICD), which is used to record these diagnoses, have been updated multiple times to reflect advances in health and medical research. Our objective was to examine the impact of transitions between ICD versions on the prevalence of chronic health conditions estimated from administrative health data. METHODS: Study data (i.e., physician billing claims, hospital records) were from the province of Manitoba, Canada, which has a universal healthcare system. ICDA-8 (with adaptations), ICD-9-CM (clinical modification), and ICD-10-CA (Canadian adaptation; hospital records only) codes are captured in the data. Annual study cohorts included all individuals 18 + years of age for 45 years from 1974 to 2018. Negative binomial regression was used to estimate annual age- and sex-adjusted prevalence and model parameters (i.e., slopes and intercepts) for 16 chronic health conditions. Statistical control charts were used to assess the impact of changes in ICD version on model parameter estimates. Hotelling's T2 statistic was used to combine the parameter estimates and provide an out-of-control signal when its value was above a pre-specified control limit. RESULTS: The annual cohort sizes ranged from 360,341 to 824,816. Hypertension and skin cancer were among the most and least diagnosed health conditions, respectively; their prevalence per 1,000 population increased from 40.5 to 223.6 and from 0.3 to 2.1, respectively, within the study period. The average annual rate of change in prevalence ranged from -1.6% (95% confidence interval [CI]: -1.8, -1.4) for acute myocardial infarction to 14.6% (95% CI: 13.9, 15.2) for hypertension. The control chart indicated out-of-control observations when transitioning from ICDA-8 to ICD-9-CM for 75% of the investigated chronic health conditions but no out-of-control observations when transitioning from ICD-9-CM to ICD-10-CA. CONCLUSIONS: The prevalence of most of the investigated chronic health conditions changed significantly in the transition from ICDA-8 to ICD-9-CM. These results point to the importance of considering changes in ICD coding as a factor that may influence the interpretation of trend estimates for chronic health conditions derived from administrative health data.

Assuntos

Hipertensão , Classificação Internacional de Doenças , Canadá , Doença Crônica , Bases de Dados Factuais , Humanos , Pessoa de Meia-Idade , Prevalência

20.

ChrNet: A re-trainable chromosome-based 1D convolutional neural network for predicting immune cell types.

Jia, Shuo; Hu, Pingzhao.

Genomics ; 113(4): 2023-2031, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33932523

RESUMO

Cells from our immune system detect and kill pathogens to protect our body against various diseases. However, current methods for determining cell types have some major limitations, such as being time-consuming and with low throughput, etc. Immune cells that are associated with cancer tissues play a critical role in revealing tumor development. Identifying the immune composition within tumor microenvironment in a timely manner will be helpful in improving clinical prognosis and therapeutic management for cancer. Although unsupervised clustering approaches have been prevailing to process scRNA-seq datasets, their results vary among studies with different input parameters and sizes, and the identification of the cell types of the clusters is still very challenging. Genes in human genome can be aligned to chromosomes with specific orders. Hence, we hypothesize incorporating this information into our learning model will potentially improve the cell type classification performance. In order to utilize gene positional information, we introduced ChrNet, a novel chromosome-specific re-trainable supervised learning method based on one-dimensional convolutional neural network (1D-CNN). By benchmarking with several models, our model shows superior performance in immune cell type profiling with larger than 90% accuracy. It is expected that this approach can become a reference architecture for other cell type classification methods. Our ChrNet tool is available online at: https://github.com/Krisloveless/ChrNet.

Assuntos

Redes Neurais de Computação , Análise de Célula Única , Cromossomos , Análise por Conglomerados , Humanos , Prognóstico , Análise de Célula Única/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA