Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 141
Filter
Add more filters

Publication year range
1.
Cell ; 150(5): 1068-81, 2012 Aug 31.
Article in English | MEDLINE | ID: mdl-22939629

ABSTRACT

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.


Subject(s)
Multiprotein Complexes/analysis , Protein Interaction Maps , Proteins/chemistry , Proteomics/methods , Humans , Tandem Mass Spectrometry
2.
Bioinformatics ; 40(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38449285

ABSTRACT

MOTIVATION: Drug-target interaction (DTI) prediction aims to identify interactions between drugs and protein targets. Deep learning can automatically learn discriminative features from drug and protein target representations for DTI prediction, but challenges remain, making it an open question. Existing approaches encode drugs and targets into features using deep learning models, but they often lack explanations for underlying interactions. Moreover, limited labeled DTIs in the chemical space can hinder model generalization. RESULTS: We propose an interpretable nested graph neural network for DTI prediction (iNGNN-DTI) using pre-trained molecule and protein models. The analysis is conducted on graph data representing drugs and targets by using a specific type of nested graph neural network, in which the target graphs are created based on 3D structures using Alphafold2. This architecture is highly expressive in capturing substructures of the graph data. We use a cross-attention module to capture interaction information between the substructures of drugs and targets. To improve feature representations, we integrate features learned by models that are pre-trained on large unlabeled small molecule and protein datasets, respectively. We evaluate our model on three benchmark datasets, and it shows a consistent improvement on all baseline models in all datasets. We also run an experiment with previously unseen drugs or targets in the test set, and our model outperforms all of the baselines. Furthermore, the iNGNN-DTI can provide more insights into the interaction by visualizing the weights learned by the cross-attention module. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithm is available at https://github.com/syan1992/iNGNN-DTI.


Subject(s)
Algorithms , Neural Networks, Computer , Drug Interactions , Benchmarking , Drug Delivery Systems
3.
PLoS Comput Biol ; 20(6): e1012254, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38935799

ABSTRACT

Spatial transcriptomics has gained popularity over the past decade due to its ability to evaluate transcriptome data while preserving spatial information. Cell segmentation is a crucial step in spatial transcriptomic analysis, as it enables the avoidance of unpredictable tissue disentanglement steps. Although high-quality cell segmentation algorithms can aid in the extraction of valuable data, traditional methods are frequently non-spatial, do not account for spatial information efficiently, and perform poorly when confronted with the problem of spatial transcriptome cell segmentation with varying shapes. In this study, we propose ST-CellSeg, an image-based machine learning method for spatial transcriptomics that uses manifold for cell segmentation and is novel in its consideration of multi-scale information. We first construct a fully connected graph which acts as a spatial transcriptomic manifold. Using multi-scale data, we then determine the low-dimensional spatial probability distribution representation for cell segmentation. Using the adjusted Rand index (ARI), normalized mutual information (NMI), and Silhouette coefficient (SC) as model performance measures, the proposed algorithm significantly outperforms baseline models in selected datasets and is efficient in computational complexity.

4.
J Transl Med ; 22(1): 226, 2024 Mar 02.
Article in English | MEDLINE | ID: mdl-38429796

ABSTRACT

BACKGROUND: Breast Cancer (BC) is a highly heterogeneous and complex disease. Personalized treatment options require the integration of multi-omic data and consideration of phenotypic variability. Radiogenomics aims to merge medical images with genomic measurements but encounter challenges due to unpaired data consisting of imaging, genomic, or clinical outcome data. In this study, we propose the utilization of a well-trained conditional generative adversarial network (cGAN) to address the unpaired data issue in radiogenomic analysis of BC. The generated images will then be used to predict the mutations status of key driver genes and BC subtypes. METHODS: We integrated the paired MRI and multi-omic (mRNA gene expression, DNA methylation, and copy number variation) profiles of 61 BC patients from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). To facilitate this integration, we employed a Bayesian Tensor Factorization approach to factorize the multi-omic data into 17 latent features. Subsequently, a cGAN model was trained based on the matched side-view patient MRIs and their corresponding latent features to predict MRIs for BC patients who lack MRIs. Model performance was evaluated by calculating the distance between real and generated images using the Fréchet Inception Distance (FID) metric. BC subtype and mutation status of driver genes were obtained from the cBioPortal platform, where 3 genes were selected based on the number of mutated patients. A convolutional neural network (CNN) was constructed and trained using the generated MRIs for mutation status prediction. Receiver operating characteristic area under curve (ROC-AUC) and precision-recall area under curve (PR-AUC) were used to evaluate the performance of the CNN models for mutation status prediction. Precision, recall and F1 score were used to evaluate the performance of the CNN model in subtype classification. RESULTS: The FID of the images from the well-trained cGAN model based on the test set is 1.31. The CNN for TP53, PIK3CA, and CDH1 mutation prediction yielded ROC-AUC values 0.9508, 0.7515, and 0.8136 and PR-AUC are 0.9009, 0.7184, and 0.5007, respectively for the three genes. Multi-class subtype prediction achieved precision, recall and F1 scores of 0.8444, 0.8435 and 0.8336 respectively. The source code and related data implemented the algorithms can be found in the project GitHub at https://github.com/mattthuang/BC_RadiogenomicGAN . CONCLUSION: Our study establishes cGAN as a viable tool for generating synthetic BC MRIs for mutation status prediction and subtype classification to better characterize the heterogeneity of BC in patients. The synthetic images also have the potential to significantly augment existing MRI data and circumvent issues surrounding data sharing and patient privacy for future BC machine learning studies.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/genetics , Radiomics , DNA Copy Number Variations , Bayes Theorem , Magnetic Resonance Imaging/methods , Mutation/genetics
5.
J Biomed Inform ; 152: 104629, 2024 04.
Article in English | MEDLINE | ID: mdl-38552994

ABSTRACT

BACKGROUND: In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS: The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.


Subject(s)
Deep Learning , Genomics , Genomics/methods , Computational Biology/methods , Algorithms , Models, Statistical
6.
Mol Cell Proteomics ; 21(1): 100189, 2022 01.
Article in English | MEDLINE | ID: mdl-34933084

ABSTRACT

Metabolism is recognized as an important driver of cancer progression and other complex diseases, but global metabolite profiling remains a challenge. Protein expression profiling is often a poor proxy since existing pathway enrichment models provide an incomplete mapping between the proteome and metabolism. To overcome these gaps, we introduce multiomic metabolic enrichment network analysis (MOMENTA), an integrative multiomic data analysis framework for more accurately deducing metabolic pathway changes from proteomics data alone in a gene set analysis context by leveraging protein interaction networks to extend annotated metabolic models. We apply MOMENTA to proteomic data from diverse cancer cell lines and human tumors to demonstrate its utility at revealing variation in metabolic pathway activity across cancer types, which we verify using independent metabolomics measurements. The novel metabolic networks we uncover in breast cancer and other tumors are linked to clinical outcomes, underscoring the pathophysiological relevance of the findings.


Subject(s)
Breast Neoplasms , Proteomics , Breast Neoplasms/metabolism , Female , Humans , Metabolic Networks and Pathways , Metabolomics , Protein Interaction Maps
7.
Bioinformatics ; 38(12): 3259-3266, 2022 06 13.
Article in English | MEDLINE | ID: mdl-35445698

ABSTRACT

MOTIVATION: Multiomics cancer profiles provide essential signals for predicting cancer survival. It is challenging to reveal the complex patterns from multiple types of data and link them to survival outcomes. We aim to develop a new deep learning-based algorithm to integrate three types of high-dimensional omics data measured on the same individuals to improve cancer survival outcome prediction. RESULTS: We built a three-dimension tensor to integrate multi-omics cancer data and factorized it into two-dimension matrices of latent factors, which were fed into neural networks-based survival networks. The new algorithm and other multi-omics-based algorithms, as well as individual genomic-based survival analysis algorithms, were applied to the breast cancer data colon and rectal cancer data from The Cancer Genome Atlas (TCGA) program. We evaluated the goodness-of-fit using the concordance index (C-index) and Integrated Brier Score (IBS). We demonstrated that the proposed tight integration framework has better survival prediction performance than the models using individual genomic data and other conventional data integration methods. AVAILABILITY AND IMPLEMENTATION: https://github.com/jasperzyzhang/DeepTensorSurvival. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Breast Neoplasms , Genomics , Humans , Female , Genomics/methods , Algorithms , Genome , Neural Networks, Computer , Breast Neoplasms/genetics
8.
Pharmacogenomics J ; 23(4): 61-72, 2023 07.
Article in English | MEDLINE | ID: mdl-36424525

ABSTRACT

Our previous studies demonstrated that the FOXM1 pathway is upregulated and the PPARA pathway downregulated in breast cancer (BC), and especially in the triple negative breast cancer (TNBC) subtype. Targeting the two pathways may offer potential therapeutic strategies to treat BC, especially TNBC which has the fewest effective therapies available among all BC subtypes. In this study we identified small molecule compounds that could modulate the PPARA and FOXM1 pathways in BC using two methods. In the first method, data were initially curated from the Connectivity Map (CMAP) database, which provides the gene expression profiles of MCF7 cells treated with different compounds as well as paired controls. We then calculated the changes in the FOXM1 and PPARA pathway activities from the compound-induced gene expression profiles under each treatment to identify compounds that produced a decreased activity in the FOXM1 pathway or an increased activity in the PPARA pathway. In the second method, the CMAP database tool was used to identify compounds that could reverse the expression pattern of the two pathways in MCF7 cells. Compounds identified as repressing the FOXM1 pathway or activating the PPARA pathway by the two methods were compared. We identified 19 common compounds that could decrease the FOXM1 pathway activity scores and reverse the FOXM1 pathway expression pattern, and 13 common compounds that could increase the PPARA pathway activity scores and reverse the PPARA pathway expression pattern. It may be of interest to validate these compounds experimentally to further investigate their effects on TNBCs.


Subject(s)
Triple Negative Breast Neoplasms , Humans , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/metabolism , Cell Line, Tumor , Forkhead Box Protein M1/genetics , Forkhead Box Protein M1/metabolism , MCF-7 Cells , Computational Biology , PPAR alpha/genetics , Gene Expression Regulation, Neoplastic
9.
PLoS Comput Biol ; 18(10): e1010613, 2022 10.
Article in English | MEDLINE | ID: mdl-36228001

ABSTRACT

Screening for novel antibacterial compounds in small molecule libraries has a low success rate. We applied machine learning (ML)-based virtual screening for antibacterial activity and evaluated its predictive power by experimental validation. We first binarized 29,537 compounds according to their growth inhibitory activity (hit rate 0.87%) against the antibiotic-resistant bacterium Burkholderia cenocepacia and described their molecular features with a directed-message passing neural network (D-MPNN). Then, we used the data to train an ML model that achieved a receiver operating characteristic (ROC) score of 0.823 on the test set. Finally, we predicted antibacterial activity in virtual libraries corresponding to 1,614 compounds from the Food and Drug Administration (FDA)-approved list and 224,205 natural products. Hit rates of 26% and 12%, respectively, were obtained when we tested the top-ranked predicted compounds for growth inhibitory activity against B. cenocepacia, which represents at least a 14-fold increase from the previous hit rate. In addition, more than 51% of the predicted antibacterial natural compounds inhibited ESKAPE pathogens showing that predictions expand beyond the organism-specific dataset to a broad range of bacteria. Overall, the developed ML approach can be used for compound prioritization before screening, increasing the typical hit rate of drug discovery.


Subject(s)
Drug Discovery , Small Molecule Libraries , United States , Small Molecule Libraries/pharmacology , Machine Learning , Anti-Bacterial Agents/pharmacology
10.
Pediatr Allergy Immunol ; 34(10): e14032, 2023 10.
Article in English | MEDLINE | ID: mdl-37877849

ABSTRACT

BACKGROUND: Identifying children at high risk of developing asthma can facilitate prevention and early management strategies. We developed a prediction model of children's asthma risk using objectively collected population-based children and parental histories of comorbidities. METHODS: We conducted a retrospective population-based cohort study using administrative data from Manitoba, Canada, and included children born from 1974 to 2000 with linkages to ≥1 parent. We identified asthma and prior comorbid condition diagnoses from hospital and outpatient records. We used two machine-learning models: least absolute shrinkage and selection operator (LASSO) logistic regression (LR) and random forest (RF) to identify important predictors. The predictors in the base model included children's demographics, allergic conditions, respiratory infections, and parental asthma. Subsequent models included additional multiple comorbidities for children and parents. RESULTS: The cohort included 195,666 children: 51.3% were males and 17.7% had asthma diagnosis. The base LR model achieved a low predictive performance with sensitivity of 0.47, 95% confidence interval (0.45-0.48), and specificity of 0.67 (0.66-0.67) using a predicted probability threshold of 0.20. Sensitivity significantly improved when children's comorbidities were included using LASSO LR: 0.71 (0.69-0.72). Predictive performance further improved by including parental comorbidities (sensitivity = 0.72 [0.70-0.73], specificity = 0.69 [0.69-0.70]). We observed similar results for the RF models. Children's menstrual disorders and mood and anxiety disorders, parental lipid metabolism disorders and asthma were among the most important variables that predicted asthma risk. CONCLUSION: Including children and parental comorbidities to children's asthma prediction models improves their accuracy.


Subject(s)
Asthma , Male , Female , Humans , Child , Cohort Studies , Retrospective Studies , Asthma/diagnosis , Asthma/epidemiology , Anxiety Disorders , Canada
11.
Genomics ; 114(5): 110474, 2022 09.
Article in English | MEDLINE | ID: mdl-36057424

ABSTRACT

BACKGROUND: It has become increasingly important to identify molecular markers for accurately diagnosing prostate cancer (PCa) stages between localized PCa (LPC) and locally advanced PCa (LAPC). However, there is a lack of profiling both epigenome-wide DNA methylation and transcriptome for the same patients with PCa at different stages. This study aims to identify epitranscriptomic biomarkers screened in the peri-prostatic (PP) adipose tissue for predicting LPC and LAPC. METHODS: We profiled gene expression and DNA methylation of 10 PCa patients' PP adipose tissue (4 LPC and 6 LAPC). Differential analysis was used to identify differentially methylated CpG sites and expressed genes. An integrative analysis of the microarray gene expression profiles and DNA methylation profiles was conducted using LASSO (least absolute shrinkage and selection operator) between each studied gene and the CpG sites in their promoter region. This epitranscriptomic signature was constructed by combining the association and differential analyses. The signature was then refined using the genetic mutation data of >1500 primary PCa and metastasis PCa samples from 4 different studies. We determined genes that were the most significantly affected by mutations. Machine learning models were built to evaluate the classification ability of the identified signature using the gene expression profiles from three external cohorts. RESULTS: From the LASSO-based association analysis, we identified 56 genes presenting significant anti-correlation between the expression level and the methylation level of at least one CpG site in the promoter region (p-value<5 × 10-8). From the differential analysis, we detected 16,405 downregulated genes and 9485 genes containing at least one hypermethylated CpG site. We identified 30 genes that showed anti-correlation, down-regulation and hyper-methylation simultaneously. Using genetic mutation data, we determined that 6 of the 30 genes showed significant differences (adjusted p-value<0.05) in mutation frequencies between the primary PCa and metastasis PCa samples. The identified 30 genes performed well in distinguishing PCa patients with metastasis from PCa patient without metastasis (area under the receiver operating characteristic curve (AUC) = 0.81). The gene signature also performed well in distinguishing PCa patients with high risk of progression from PCa patients with low risk of progression (AUC = 0.88). CONCLUSIONS: We established an integrative framework to identify differentially expressed genes with an aberrant methylation pattern on PP adipose tissue that may represent novel candidate molecular markers for distinguishing between LPC and LAPC.


Subject(s)
DNA Methylation , Prostatic Neoplasms , Biomarkers/metabolism , CpG Islands , Epigenome , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Male , Promoter Regions, Genetic , Prostatic Neoplasms/metabolism , Transcriptome
12.
BMC Bioinformatics ; 23(Suppl 4): 132, 2022 Apr 15.
Article in English | MEDLINE | ID: mdl-35428173

ABSTRACT

BACKGROUND: Converting molecules into computer-interpretable features with rich molecular information is a core problem of data-driven machine learning applications in chemical and drug-related tasks. Generally speaking, there are global and local features to represent a given molecule. As most algorithms have been developed based on one type of feature, a remaining bottleneck is to combine both feature sets for advanced molecule-based machine learning analysis. Here, we explored a novel analytical framework to make embeddings of the molecular features and apply them in the clustering of a large number of small molecules. RESULTS: In this novel framework, we first introduced a principal component analysis method encoding the molecule-specific atom and bond information. We then used a variational autoencoder (AE)-based method to make embeddings of the global chemical properties and the local atom and bond features. Next, using the embeddings from the encoded local and global features, we implemented and compared several unsupervised clustering algorithms to group the molecule-specific embeddings. The number of clusters was treated as a hyper-parameter and determined by the Silhouette method. Finally, we evaluated the corresponding results using three internal indices. Applying the analysis framework to a large chemical library of more than 47,000 molecules, we successfully identified 50 molecular clusters using the K-means method with 32 embeddings based on the AE method. We visualized the clustering result via t-SNE for the overall distribution of molecules and the similarity maps for the structural analysis of randomly selected cluster-specific molecules. CONCLUSIONS: This study developed a novel analytical framework that comprises a feature engineering scheme for molecule-specific atomic and bonding features and a deep learning-based embedding strategy for different molecular features. By applying the identified embeddings, we show their usefulness for clustering a large molecule dataset. Our novel analytic algorithms can be applied to any virtual library of chemical compounds with diverse molecular structures. Hence, these tools have the potential of optimizing drug discovery, as they can decrease the number of compounds to be screened in any drug screening campaign.


Subject(s)
Algorithms , Cluster Analysis , Principal Component Analysis
13.
BMC Bioinformatics ; 23(Suppl 7): 343, 2022 Aug 17.
Article in English | MEDLINE | ID: mdl-35974325

ABSTRACT

BACKGROUND: A recurring problem in image segmentation is a lack of labelled data. This problem is especially acute in the segmentation of lung computed tomography (CT) of patients with Coronavirus Disease 2019 (COVID-19). The reason for this is simple: the disease has not been prevalent long enough to generate a great number of labels. Semi-supervised learning promises a way to learn from data that is unlabelled and has seen tremendous advancements in recent years. However, due to the complexity of its label space, those advancements cannot be applied to image segmentation. That being said, it is this same complexity that makes it extremely expensive to obtain pixel-level labels, making semi-supervised learning all the more appealing. This study seeks to bridge this gap by proposing a novel model that utilizes the image segmentation abilities of deep convolution networks and the semi-supervised learning abilities of generative models for chest CT images of patients with the COVID-19. RESULTS: We propose a novel generative model called the shared variational autoencoder (SVAE). The SVAE utilizes a five-layer deep hierarchy of latent variables and deep convolutional mappings between them, resulting in a generative model that is well suited for lung CT images. Then, we add a novel component to the final layer of the SVAE which forces the model to reconstruct the input image using a segmentation that must match the ground truth segmentation whenever it is present. We name this final model StitchNet. CONCLUSION: We compare StitchNet to other image segmentation models on a high-quality dataset of CT images from COVID-19 patients. We show that our model has comparable performance to the other segmentation models. We also explore the potential limitations and advantages in our proposed algorithm and propose some potential future research directions for this challenging issue.


Subject(s)
COVID-19 , Image Processing, Computer-Assisted , Algorithms , COVID-19/diagnostic imaging , Humans , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Tomography, X-Ray Computed
14.
Hum Genet ; 141(3-4): 965-979, 2022 Apr.
Article in English | MEDLINE | ID: mdl-34633540

ABSTRACT

Otosclerosis is a bone disorder of the otic capsule and common form of late-onset hearing impairment. Considered a complex disease, little is known about its pathogenesis. Over the past 20 years, ten autosomal dominant loci (OTSC1-10) have been mapped but no genes identified. Herein, we map a new OTSC locus to a 9.96 Mb region within the FOX gene cluster on 16q24.1 and identify a 15 bp coding deletion in Forkhead Box L1 co-segregating with otosclerosis in a Caucasian family. Pre-operative phenotype ranges from moderate to severe hearing loss to profound sensorineural loss requiring a cochlear implant. Mutant FOXL1 is both transcribed and translated and correctly locates to the cell nucleus. However, the deletion of 5 residues in the C-terminus of mutant FOXL1 causes a complete loss of transcriptional activity due to loss of secondary (alpha helix) structure. FOXL1 (rs764026385) was identified in a second unrelated case on a shared background. We conclude that FOXL1 (rs764026385) is pathogenic and causes autosomal dominant otosclerosis and propose a key inhibitory role for wildtype Foxl1 in bone remodelling in the otic capsule. New insights into the molecular pathology of otosclerosis from this study provide molecular targets for non-invasive therapeutic interventions.


Subject(s)
Otosclerosis , Forkhead Transcription Factors/genetics , Humans , Otosclerosis/genetics
15.
Eur Respir J ; 59(1)2022 01.
Article in English | MEDLINE | ID: mdl-34112731

ABSTRACT

Although mesenchymal stromal (stem) cell (MSC) administration attenuates sepsis-induced lung injury in pre-clinical models, the mechanism(s) of action and host immune system contributions to its therapeutic effects remain elusive. We show that treatment with MSCs decreased expression of host-derived microRNA (miR)-193b-5p and increased expression of its target gene, the tight junctional protein occludin (Ocln), in lungs from septic mice. Mutating the Ocln 3' untranslated region miR-193b-5p binding sequence impaired binding to Ocln mRNA. Inhibition of miR-193b-5p in human primary pulmonary microvascular endothelial cells prevents tumour necrosis factor (TNF)-induced decrease in Ocln gene and protein expression and loss of barrier function. MSC-conditioned media mitigated TNF-induced miR-193b-5p upregulation and Ocln downregulation in vitro When administered in vivo, MSC-conditioned media recapitulated the effects of MSC administration on pulmonary miR-193b-5p and Ocln expression. MiR-193b-deficient mice were resistant to pulmonary inflammation and injury induced by lipopolysaccharide (LPS) instillation. Silencing of Ocln in miR-193b-deficient mice partially recovered the susceptibility to LPS-induced lung injury. In vivo inhibition of miR-193b-5p protected mice from endotoxin-induced lung injury. Finally, the clinical significance of these results was supported by the finding of increased miR-193b-5p expression levels in lung autopsy samples from acute respiratory distress syndrome patients who died with diffuse alveolar damage.


Subject(s)
Acute Lung Injury , MicroRNAs , Sepsis , Acute Lung Injury/therapy , Animals , Cell- and Tissue-Based Therapy , Endothelial Cells , Humans , Mice , MicroRNAs/genetics , Sepsis/complications , Sepsis/therapy
16.
J Transl Med ; 20(1): 43, 2022 01 27.
Article in English | MEDLINE | ID: mdl-35086532

ABSTRACT

BACKGROUND: Approximately 40% of persons with inflammatory bowel disease (IBD) experience psychiatric comorbidities (PC). Previous studies demonstrated the polygenetic effect on both IBD and PC. In this study, we evaluated the contribution of genetic variants to PC among the IBD population. Additionally, we evaluated whether this effect is mediated by the expression level of the RBPMS gene, which was identified in our previous studies as a potential risk factor of PC in persons with IBD. MATERIALS AND METHODS: The polygenic risk score (PRS) was estimated among persons with IBD of European ancestry (n = 240) from the Manitoba IBD Cohort Study by using external genome-wide association studies (GWAS). The association and prediction performance were examined between the estimated PRS and PC status among persons with IBD. Finally, regression-based models were applied to explore whether the imputed expression level of the RBPMS gene is a mediator between estimated PRS and PC status in IBD. RESULTS: The estimated PRS had a significantly positive association with PC status (for the highest effect: P-value threshold = 5 × 10-3, odds ratio = 2.0, P-value = 1.5 × 10-5). Around 13% of the causal effect between the PRS and PC status in IBD was mediated by the expression level of the RBPMS gene. The area under the curve of the PRS-based PC prediction model is around 0.7 at the threshold of 5 × 10-4. CONCLUSION: PC status in IBD depends on genetic influences among persons with European ancestry. The PRS could potentially be applied to PC risk screening to identify persons with IBD at a high risk of PC. Around 13% of this genetic influence could be explained by the expression level of the RBPMS gene.


Subject(s)
Genome-Wide Association Study , Inflammatory Bowel Diseases , Cohort Studies , Comorbidity , Genetic Predisposition to Disease , Humans , Inflammatory Bowel Diseases/complications , Inflammatory Bowel Diseases/genetics , Multifactorial Inheritance/genetics
17.
J Biomed Inform ; 125: 103958, 2022 01.
Article in English | MEDLINE | ID: mdl-34839017

ABSTRACT

Breast cancer is a highly heterogeneous disease. Subtyping the disease and identifying the genomic features driving these subtypes are critical for precision oncology for breast cancer. This study focuses on developing a new computational approach for breast cancer subtyping. We proposed to use Bayesian tensor factorization (BTF) to integrate multi-omics data of breast cancer, which include expression profiles of RNA-sequencing, copy number variation, and DNA methylation measured on 762 breast cancer patients from The Cancer Genome Atlas. We applied a consensus clustering approach to identify breast cancer subtypes using the factorized latent features by BTF. Subtype-specific survival patterns of the breast cancer patients were evaluated using Kaplan-Meier (KM) estimators. The proposed approach was compared with other state-of-the-art approaches for cancer subtyping. The BTF-subtyping analysis identified 17 optimized latent components, which were used to reveal six major breast cancer subtypes. Out of all different approaches, only the proposed approach showed distinct survival patterns (p < 0.05). Statistical tests also showed that the identified clusters have statistically significant distributions. Our results showed that the proposed approach is a promising strategy to efficiently use publicly available multi-omics data to identify breast cancer subtypes.


Subject(s)
Breast Neoplasms , Bayes Theorem , Breast Neoplasms/genetics , DNA Copy Number Variations , Female , Genomics , Humans , Precision Medicine
18.
BMC Public Health ; 22(1): 701, 2022 04 09.
Article in English | MEDLINE | ID: mdl-35397596

ABSTRACT

BACKGROUND: Diagnosis codes in administrative health data are routinely used to monitor trends in disease prevalence and incidence. The International Classification of Diseases (ICD), which is used to record these diagnoses, have been updated multiple times to reflect advances in health and medical research. Our objective was to examine the impact of transitions between ICD versions on the prevalence of chronic health conditions estimated from administrative health data. METHODS: Study data (i.e., physician billing claims, hospital records) were from the province of Manitoba, Canada, which has a universal healthcare system. ICDA-8 (with adaptations), ICD-9-CM (clinical modification), and ICD-10-CA (Canadian adaptation; hospital records only) codes are captured in the data. Annual study cohorts included all individuals 18 + years of age for 45 years from 1974 to 2018. Negative binomial regression was used to estimate annual age- and sex-adjusted prevalence and model parameters (i.e., slopes and intercepts) for 16 chronic health conditions. Statistical control charts were used to assess the impact of changes in ICD version on model parameter estimates. Hotelling's T2 statistic was used to combine the parameter estimates and provide an out-of-control signal when its value was above a pre-specified control limit. RESULTS: The annual cohort sizes ranged from 360,341 to 824,816. Hypertension and skin cancer were among the most and least diagnosed health conditions, respectively; their prevalence per 1,000 population increased from 40.5 to 223.6 and from 0.3 to 2.1, respectively, within the study period. The average annual rate of change in prevalence ranged from -1.6% (95% confidence interval [CI]: -1.8, -1.4) for acute myocardial infarction to 14.6% (95% CI: 13.9, 15.2) for hypertension. The control chart indicated out-of-control observations when transitioning from ICDA-8 to ICD-9-CM for 75% of the investigated chronic health conditions but no out-of-control observations when transitioning from ICD-9-CM to ICD-10-CA. CONCLUSIONS: The prevalence of most of the investigated chronic health conditions changed significantly in the transition from ICDA-8 to ICD-9-CM. These results point to the importance of considering changes in ICD coding as a factor that may influence the interpretation of trend estimates for chronic health conditions derived from administrative health data.


Subject(s)
Hypertension , International Classification of Diseases , Canada , Chronic Disease , Databases, Factual , Humans , Middle Aged , Prevalence
19.
Genomics ; 113(4): 2023-2031, 2021 07.
Article in English | MEDLINE | ID: mdl-33932523

ABSTRACT

Cells from our immune system detect and kill pathogens to protect our body against various diseases. However, current methods for determining cell types have some major limitations, such as being time-consuming and with low throughput, etc. Immune cells that are associated with cancer tissues play a critical role in revealing tumor development. Identifying the immune composition within tumor microenvironment in a timely manner will be helpful in improving clinical prognosis and therapeutic management for cancer. Although unsupervised clustering approaches have been prevailing to process scRNA-seq datasets, their results vary among studies with different input parameters and sizes, and the identification of the cell types of the clusters is still very challenging. Genes in human genome can be aligned to chromosomes with specific orders. Hence, we hypothesize incorporating this information into our learning model will potentially improve the cell type classification performance. In order to utilize gene positional information, we introduced ChrNet, a novel chromosome-specific re-trainable supervised learning method based on one-dimensional convolutional neural network (1D-CNN). By benchmarking with several models, our model shows superior performance in immune cell type profiling with larger than 90% accuracy. It is expected that this approach can become a reference architecture for other cell type classification methods. Our ChrNet tool is available online at: https://github.com/Krisloveless/ChrNet.


Subject(s)
Neural Networks, Computer , Single-Cell Analysis , Chromosomes , Cluster Analysis , Humans , Prognosis , Single-Cell Analysis/methods
20.
Genomics ; 113(3): 919-932, 2021 05.
Article in English | MEDLINE | ID: mdl-33588072

ABSTRACT

BACKGROUND: Inflammatory bowel disease (IBD) affects millions of people in North America, and patients with IBD have a high incidence of psychiatric comorbidities (PC). The genetic mechanisms underlying the link are, in general, poorly understood. MATERIALS AND METHODS: A transcriptome-wide association study (TWAS) was performed using genetically regulated gene expression profiles imputed from the genetic profiles of 240 IBD patients in the Manitoba IBD Cohort Study. The imputation was performed using the 44 non-diseased human tissue-specific reference models from the GTEx database. Linear modeling and gene set enrichment analysis were performed to identify genes and pathways that are significantly associated with IBD patients with PC compared to IBD alone in each of the 44 non-diseased human tissues. Finally, an enrichment map was generated to investigate networks of the enriched gene sets associated with IBD patients with PC. RESULTS: The genes RBPMS in skeletal muscle (adjusted p = 0.05), KCNA5 in the cerebellar hemisphere of the brain (adjusted p = 0.09), GSR, SMIM34A, and LIPT2 in the frontal cortex of the brain (adjusted p = 0.09 for each) were the top genetically regulated genes with a suggestive association with IBD patients with PC. We identified three gene set networks, which include gene sets and pathways with a suggestive association with IBD patients with PC: one with 7 gene sets overlapping in apolipoprotein B mRNA editing subunit genes, one with 3 gene sets including pigmentation gene sets, and the other one with 3 gene sets including peptidyl tyrosine phosphorylation regulation related gene sets. CONCLUSIONS: Our TWAS analysis has identified genes and pathways with a suggestive association with IBD patients with PC. These findings can be potentially used for illustrating the mechanism of developing PC in the patients with IBD and developing diagnosis tool or drug targets for IBD patients with PC.


Subject(s)
Genome-Wide Association Study , Inflammatory Bowel Diseases , Cohort Studies , Comorbidity , Humans , Inflammatory Bowel Diseases/complications , Inflammatory Bowel Diseases/epidemiology , Inflammatory Bowel Diseases/genetics , Pilot Projects , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL