Búsqueda | Portal de Búsqueda de la BVS

1.

Transcriptomics and epigenetic data integration learning module on Google Cloud.

Ruprecht, Nathan A; Kennedy, Joshua D; Bansal, Benu; Singhal, Sonalika; Sens, Donald; Maggio, Angela; Doe, Valena; Hawkins, Dale; Campbel, Ross; O'Connell, Kyle; Gill, Jappreet Singh; Schaefer, Kalli; Singhal, Sandeep K.

Brief Bioinform ; 25(Supplement_1)2024 Jul 23.

Artículo en Inglés | MEDLINE | ID: mdl-39101486

RESUMEN

Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

Asunto(s)

Nube Computacional , Epigenómica , Humanos , Epigenómica/métodos , Epigénesis Genética , Transcriptoma , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Minería de Datos/métodos

2.

Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data.

Rashid, Md Mamunur; Selvarajoo, Kumar.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38904542

RESUMEN

The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.

Asunto(s)

Neoplasias de la Mama , Aprendizaje Automático , Humanos , Neoplasias de la Mama/genética , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/metabolismo , Femenino , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Algoritmos , Antineoplásicos/uso terapéutico , Antineoplásicos/farmacología , Biología Computacional/métodos , Genómica/métodos

3.

DeepIDA-GRU: a deep learning pipeline for integrative discriminant analysis of cross-sectional and longitudinal multiview data with applications to inflammatory bowel disease classification.

Jain, Sarthak; Safo, Sandra E.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-39007595

RESUMEN

Biomedical research now commonly integrates diverse data types or views from the same individuals to better understand the pathobiology of complex diseases, but the challenge lies in meaningfully integrating these diverse views. Existing methods often require the same type of data from all views (cross-sectional data only or longitudinal data only) or do not consider any class outcome in the integration method, which presents limitations. To overcome these limitations, we have developed a pipeline that harnesses the power of statistical and deep learning methods to integrate cross-sectional and longitudinal data from multiple sources. In addition, it identifies key variables that contribute to the association between views and the separation between classes, providing deeper biological insights. This pipeline includes variable selection/ranking using linear and nonlinear methods, feature extraction using functional principal component analysis and Euler characteristics, and joint integration and classification using dense feed-forward networks for cross-sectional data and recurrent neural networks for longitudinal data. We applied this pipeline to cross-sectional and longitudinal multiomics data (metagenomics, transcriptomics and metabolomics) from an inflammatory bowel disease (IBD) study and identified microbial pathways, metabolites and genes that discriminate by IBD status, providing information on the etiology of IBD. We conducted simulations to compare the two feature extraction methods.

Asunto(s)

Aprendizaje Profundo , Enfermedades Inflamatorias del Intestino , Humanos , Estudios Transversales , Enfermedades Inflamatorias del Intestino/clasificación , Enfermedades Inflamatorias del Intestino/genética , Estudios Longitudinales , Análisis Discriminante , Metabolómica/métodos , Biología Computacional/métodos

4.

Multi-omics integration for both single-cell and spatially resolved data based on dual-path graph attention auto-encoder.

Lv, Tongxuan; Zhang, Yong; Liu, Junlin; Kang, Qiang; Liu, Lin.

Brief Bioinform ; 25(5)2024 Jul 25.

Artículo en Inglés | MEDLINE | ID: mdl-39293805

RESUMEN

Single-cell multi-omics integration enables joint analysis at the single-cell level of resolution to provide more accurate understanding of complex biological systems, while spatial multi-omics integration is benefit to the exploration of cell spatial heterogeneity to facilitate more comprehensive downstream analyses. Existing methods are mainly designed for single-cell multi-omics data with little consideration of spatial information and still have room for performance improvement. A reliable multi-omics integration method designed for both single-cell and spatially resolved data is necessary and significant. We propose a multi-omics integration method based on dual-path graph attention auto-encoder (SSGATE). It can construct the neighborhood graphs based on single-cell expression profiles or spatial coordinates, enabling it to process single-cell data and utilize spatial information from spatially resolved data. It can also perform self-supervised learning for integration through the graph attention auto-encoders from two paths. SSGATE is applied to integration of transcriptomics and proteomics, including single-cell and spatially resolved data of various tissues from different sequencing technologies. SSGATE shows better performance and stronger robustness than competitive methods and facilitates downstream analysis.

Asunto(s)

Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Biología Computacional/métodos , Humanos , Proteómica/métodos , Algoritmos , Transcriptoma , Multiómica

5.

Yeast increases glycolytic flux to support higher growth rates accompanied by decreased metabolite regulation and lower protein phosphorylation.

Chen, Min; Xie, Tingting; Li, Huan; Zhuang, Yingping; Xia, Jianye; Nielsen, Jens.

Proc Natl Acad Sci U S A ; 120(25): e2302779120, 2023 06 20.

Artículo en Inglés | MEDLINE | ID: mdl-37307493

RESUMEN

Supply of Gibbs free energy and precursors are vital for cellular function and cell metabolism have evolved to be tightly regulated to balance their supply and consumption. Precursors and Gibbs free energy are generated in the central carbon metabolism (CCM), and fluxes through these pathways are precisely regulated. However, how fluxes through CCM pathways are affected by posttranslational modification and allosteric regulation remains poorly understood. Here, we integrated multi-omics data collected under nine different chemostat conditions to explore how fluxes in the CCM are regulated in the yeast Saccharomyces cerevisiae. We deduced a pathway- and metabolism-specific CCM flux regulation mechanism using hierarchical analysis combined with mathematical modeling. We found that increased glycolytic flux associated with an increased specific growth rate was accompanied by a decrease in flux regulation by metabolite concentrations, including the concentration of allosteric effectors, and a decrease in the phosphorylation level of glycolytic enzymes.

Asunto(s)

Procesamiento Proteico-Postraduccional , Saccharomyces cerevisiae , Fosforilación , Regulación Alostérica , Carbono

6.

A denoised multi-omics integration framework for cancer subtype classification and survival prediction.

Pang, Jiali; Liang, Bilin; Ding, Ruifeng; Yan, Qiujuan; Chen, Ruiyao; Xu, Jie.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37594302

RESUMEN

The availability of high-throughput sequencing data creates opportunities to comprehensively understand human diseases as well as challenges to train machine learning models using such high dimensions of data. Here, we propose a denoised multi-omics integration framework, which contains a distribution-based feature denoising algorithm, Feature Selection with Distribution (FSD), for dimension reduction and a multi-omics integration framework, Attention Multi-Omics Integration (AttentionMOI) to predict cancer prognosis and identify cancer subtypes. We demonstrated that FSD improved model performance either using single omic data or multi-omics data in 15 The Cancer Genome Atlas Program (TCGA) cancers for survival prediction and kidney cancer subtype identification. And our integration framework AttentionMOI outperformed machine learning models and current multi-omics integration algorithms with high dimensions of features. Furthermore, FSD identified features that were associated to cancer prognosis and could be considered as biomarkers.

Asunto(s)

Genómica , Neoplasias , Humanos , Genómica/métodos , Multiómica , Neoplasias/genética , Algoritmos

7.

Multi-omics regulatory network inference in the presence of missing data.

Henao, Juan D; Lauber, Michael; Azevedo, Manuel; Grekova, Anastasiia; Theis, Fabian; List, Markus; Ogris, Christoph; Schubert, Benjamin.

Brief Bioinform ; 24(5)2023 09 20.

Artículo en Inglés | MEDLINE | ID: mdl-37670505

RESUMEN

A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.

Asunto(s)

Benchmarking , Multiómica , Humanos , Biología de Sistemas

8.

Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage.

Li, Chuan-Xing; Chen, Hongyan; Zounemat-Kermani, Nazanin; Adcock, Ian M; Sköld, C Magnus; Zhou, Meng; Wheelock, Åsa M.

Brief Bioinform ; 25(1)2023 11 22.

Artículo en Inglés | MEDLINE | ID: mdl-38205966

RESUMEN

Multi-omics data integration is a complex and challenging task in biomedical research. Consensus clustering, also known as meta-clustering or cluster ensembles, has become an increasingly popular downstream tool for phenotyping and endotyping using multiple omics and clinical data. However, current consensus clustering methods typically rely on ensembling clustering outputs with similar sample coverages (mathematical replicates), which may not reflect real-world data with varying sample coverages (biological replicates). To address this issue, we propose a new consensus clustering with missing labels (ccml) strategy termed ccml, an R protocol for two-step consensus clustering that can handle unequal missing labels (i.e. multiple predictive labels with different sample coverages). Initially, the regular consensus weights are adjusted (normalized) by sample coverage, then a regular consensus clustering is performed to predict the optimal final cluster. We applied the ccml method to predict molecularly distinct groups based on 9-omics integration in the Karolinska COSMIC cohort, which investigates chronic obstructive pulmonary disease, and 24-omics handprint integrative subgrouping of adult asthma patients of the U-BIOPRED cohort. We propose ccml as a downstream toolkit for multi-omics integration analysis algorithms such as Similarity Network Fusion and robust clustering of clinical data to overcome the limitations posed by missing data, which is inevitable in human cohorts consisting of multiple data modalities. The ccml tool is available in the R language (https://CRAN.R-project.org/package=ccml, https://github.com/pulmonomics-lab/ccml, or https://github.com/ZhoulabCPH/ccml).

Asunto(s)

Asma , Multiómica , Adulto , Humanos , Consenso , Análisis por Conglomerados , Algoritmos , Asma/genética

9.

bulkAnalyseR: an accessible, interactive pipeline for analysing and sharing bulk multi-modal sequencing data.

Moutsopoulos, Ilias; Williams, Eleanor C; Mohorianu, Irina I.

Brief Bioinform ; 24(1)2023 01 19.

Artículo en Inglés | MEDLINE | ID: mdl-36583521

RESUMEN

Bulk sequencing experiments (single- and multi-omics) are essential for exploring wide-ranging biological questions. To facilitate interactive, exploratory tasks, coupled with the sharing of easily accessible information, we present bulkAnalyseR, a package integrating state-of-the-art approaches using an expression matrix as the starting point (pre-processing functions are available as part of the package). Static summary images are replaced with interactive panels illustrating quality-checking, differential expression analysis (with noise detection) and biological interpretation (enrichment analyses, identification of expression patterns, followed by inference and comparison of regulatory interactions). bulkAnalyseR can handle different modalities, facilitating robust integration and comparison of cis-, trans- and customised regulatory networks.

Asunto(s)

Multiómica

10.

DROEG: a method for cancer drug response prediction based on omics and essential genes integration.

Wu, Peike; Sun, Renliang; Fahira, Aamir; Chen, Yongzhou; Jiangzhou, Huiting; Wang, Ke; Yang, Qiangzhen; Dai, Yang; Pan, Dun; Shi, Yongyong; Wang, Zhuo.

Brief Bioinform ; 24(2)2023 03 19.

Artículo en Inglés | MEDLINE | ID: mdl-36715269

RESUMEN

Predicting therapeutic responses in cancer patients is a major challenge in the field of precision medicine due to high inter- and intra-tumor heterogeneity. Most drug response models need to be improved in terms of accuracy, and there is limited research to assess therapeutic responses of particular tumor types. Here, we developed a novel method DROEG (Drug Response based on Omics and Essential Genes) for prediction of drug response in tumor cell lines by integrating genomic, transcriptomic and methylomic data along with CRISPR essential genes, and revealed that the incorporation of tumor proliferation essential genes can improve drug sensitivity prediction. Concisely, DROEG integrates literature-based and statistics-based methods to select features and uses Support Vector Regression for model construction. We demonstrate that DROEG outperforms most state-of-the-art algorithms by both qualitative (prediction accuracy for drug-sensitive/resistant) and quantitative (Pearson correlation coefficient between the predicted and actual IC50) evaluation in Genomics of Drug Sensitivity in Cancer and Cancer Cell Line Encyclopedia datasets. In addition, DROEG is further applied to the pan-gastrointestinal tumor with high prevalence and mortality as a case study at both cell line and clinical levels to evaluate the model efficacy and discover potential prognostic biomarkers in Cisplatin and Epirubicin treatment. Interestingly, the CRISPR essential gene information is found to be the most important contributor to enhance the accuracy of the DROEG model. To our knowledge, this is the first study to integrate essential genes with multi-omics data to improve cancer drug response prediction and provide insights into personalized precision treatment.

Asunto(s)

Antineoplásicos , Neoplasias , Humanos , Genes Esenciales , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Genómica/métodos , Medicina de Precisión/métodos

11.

Yeast9: a consensus genome-scale metabolic model for S. cerevisiae curated by the community.

Zhang, Chengyu; Sánchez, Benjamín J; Li, Feiran; Eiden, Cheng Wei Quan; Scott, William T; Liebal, Ulf W; Blank, Lars M; Mengers, Hendrik G; Anton, Mihail; Rangel, Albert Tafur; Mendoza, Sebastián N; Zhang, Lixin; Nielsen, Jens; Lu, Hongzhong; Kerkhoven, Eduard J.

Mol Syst Biol ; 20(10): 1134-1150, 2024 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-39134886

RESUMEN

Genome-scale metabolic models (GEMs) can facilitate metabolism-focused multi-omics integrative analysis. Since Yeast8, the yeast-GEM of Saccharomyces cerevisiae, published in 2019, has been continuously updated by the community. This has increased the quality and scope of the model, culminating now in Yeast9. To evaluate its predictive performance, we generated 163 condition-specific GEMs constrained by single-cell transcriptomics from osmotic pressure or reference conditions. Comparative flux analysis showed that yeast adapting to high osmotic pressure benefits from upregulating fluxes through central carbon metabolism. Furthermore, combining Yeast9 with proteomics revealed metabolic rewiring underlying its preference for nitrogen sources. Lastly, we created strain-specific GEMs (ssGEMs) constrained by transcriptomics for 1229 mutant strains. Well able to predict the strains' growth rates, fluxomics from those large-scale ssGEMs outperformed transcriptomics in predicting functional categories for all studied genes in machine learning models. Based on those findings we anticipate that Yeast9 will continue to empower systems biology studies of yeast metabolism.

Asunto(s)

Genoma Fúngico , Modelos Biológicos , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/crecimiento & desarrollo , Biología de Sistemas/métodos , Proteómica , Transcriptoma , Redes y Vías Metabólicas/genética , Presión Osmótica , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Carbono/metabolismo , Nitrógeno/metabolismo , Perfilación de la Expresión Génica

12.

O-Glycomic and Proteomic Signatures of Spontaneous and Butyrate-Stimulated Colorectal Cancer Cell Line Differentiation.

Madunic, K; Luijkx, Y M C A; Mayboroda, O A; Janssen, G M C; van Veelen, P A; Strijbis, K; Wennekes, T; Lageveen-Kammeijer, G S M; Wuhrer, M.

Mol Cell Proteomics ; 22(3): 100501, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36669592

RESUMEN

Gut microbiota of the gastrointestinal tract provide health benefits to the human host via bacterial metabolites. Bacterial butyrate has beneficial effects on intestinal homeostasis and is the preferred energy source of intestinal epithelial cells, capable of inducing differentiation. It was previously observed that changes in the expression of specific proteins as well as protein glycosylation occur with differentiation. In this study, specific mucin O-glycans were identified that mark butyrate-induced epithelial differentiation of the intestinal cell line CaCo-2 (Cancer Coli-2), by applying porous graphitized carbon nano-liquid chromatography with electrospray ionization tandem mass spectrometry. Moreover, a quantitative proteomic approach was used to decipher changes in the cell proteome. It was found that the fully differentiated butyrate-stimulated cells are characterized by a higher expression of sialylated O-glycan structures, whereas fucosylation is downregulated with differentiation. By performing an integrative approach, we generated hypotheses about the origin of the observed O-glycome changes. These insights pave the way for future endeavors to study the dynamic O-glycosylation patterns in the gut, either produced via cellular biosynthesis or through the action of bacterial glycosidases as well as the functional role of these patterns in homeostasis and dysbiosis at the gut-microbiota interface.

Asunto(s)

Neoplasias Colorrectales , Proteómica , Humanos , Células CACO-2 , Proteómica/métodos , Glicómica/métodos , Butiratos/farmacología , Diferenciación Celular , Polisacáridos/metabolismo

13.

Smccnet 2.0: a comprehensive tool for multi-omics network inference with shiny visualization.

Liu, Weixuan; Vu, Thao; R Konigsberg, Iain; A Pratte, Katherine; Zhuang, Yonghua; Kechris, Katerina J.

BMC Bioinformatics ; 25(1): 276, 2024 Aug 24.

Artículo en Inglés | MEDLINE | ID: mdl-39179997

RESUMEN

Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. AVAILABILITY : This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/ .

Asunto(s)

Aprendizaje Automático , Programas Informáticos , Genómica/métodos , Redes Reguladoras de Genes , Biología Computacional/métodos , Humanos , Multiómica

14.

A multimodal graph neural network framework for cancer molecular subtype classification.

Li, Bingjun; Nabavi, Sheida.

BMC Bioinformatics ; 25(1): 27, 2024 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-38225583

RESUMEN

BACKGROUND: The recent development of high-throughput sequencing has created a large collection of multi-omics data, which enables researchers to better investigate cancer molecular profiles and cancer taxonomy based on molecular subtypes. Integrating multi-omics data has been proven to be effective for building more precise classification models. Most current multi-omics integrative models use either an early fusion in the form of concatenation or late fusion with a separate feature extractor for each omic, which are mainly based on deep neural networks. Due to the nature of biological systems, graphs are a better structural representation of bio-medical data. Although few graph neural network (GNN) based multi-omics integrative methods have been proposed, they suffer from three common disadvantages. One is most of them use only one type of connection, either inter-omics or intra-omic connection; second, they only consider one kind of GNN layer, either graph convolution network (GCN) or graph attention network (GAT); and third, most of these methods have not been tested on a more complex classification task, such as cancer molecular subtypes. RESULTS: In this study, we propose a novel end-to-end multi-omics GNN framework for accurate and robust cancer subtype classification. The proposed model utilizes multi-omics data in the form of heterogeneous multi-layer graphs, which combine both inter-omics and intra-omic connections from established biological knowledge. The proposed model incorporates learned graph features and global genome features for accurate classification. We tested the proposed model on the Cancer Genome Atlas (TCGA) Pan-cancer dataset and TCGA breast invasive carcinoma (BRCA) dataset for molecular subtype and cancer subtype classification, respectively. The proposed model shows superior performance compared to four current state-of-the-art baseline models in terms of accuracy, F1 score, precision, and recall. The comparative analysis of GAT-based models and GCN-based models reveals that GAT-based models are preferred for smaller graphs with less information and GCN-based models are preferred for larger graphs with extra information.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Conocimiento , Aprendizaje , Redes Neurales de la Computación , Neoplasias/genética

15.

Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome.

Tavis, Steven; Hettich, Robert L.

BMC Genomics ; 25(1): 267, 2024 Mar 11.

Artículo en Inglés | MEDLINE | ID: mdl-38468234

RESUMEN

In every omics experiment, genes or their products are identified for which even state of the art tools are unable to assign a function. In the biotechnology chassis organism Pseudomonas putida, these proteins of unknown function make up 14% of the proteome. This missing information can bias analyses since these proteins can carry out functions which impact the engineering of organisms. As a consequence of predicting protein function across all organisms, function prediction tools generally fail to use all of the types of data available for any specific organism, including protein and transcript expression information. Additionally, the release of Alphafold predictions for all Uniprot proteins provides a novel opportunity for leveraging structural information. We constructed a bespoke machine learning model to predict the function of recalcitrant proteins of unknown function in Pseudomonas putida based on these sources of data, which annotated 1079 terms to 213 proteins. Among the predicted functions supplied by the model, we found evidence for a significant overrepresentation of nitrogen metabolism and macromolecule processing proteins. These findings were corroborated by manual analyses of selected proteins which identified, among others, a functionally unannotated operon that likely encodes a branch of the shikimate pathway.

Asunto(s)

Pseudomonas putida , Pseudomonas putida/genética , Proteoma/metabolismo , Multiómica , Biotecnología , Operón

16.

Subtype-WESLR: identifying cancer subtype with weighted ensemble sparse latent representation of multi-view data.

Song, Wenjing; Wang, Weiwen; Dai, Dao-Qing.

Brief Bioinform ; 23(1)2022 01 17.

Artículo en Inglés | MEDLINE | ID: mdl-34607358

RESUMEN

The discovery of cancer subtypes has become much-researched topic in oncology. Dividing cancer patients into subtypes can provide personalized treatments for heterogeneous patients. High-throughput technologies provide multiple omics data for cancer subtyping. Integration of multi-view data is used to identify cancer subtypes in many computational methods, which obtain different subtypes for the same cancer, even using the same multi-omics data. To a certain extent, these subtypes from distinct methods are related, which may have certain guiding significance for cancer subtyping. It is a challenge to effectively utilize the valuable information of distinct subtypes to produce more accurate and reliable subtypes. A weighted ensemble sparse latent representation (subtype-WESLR) is proposed to detect cancer subtypes on heterogeneous omics data. Using a weighted ensemble strategy to fuse base clustering obtained by distinct methods as prior knowledge, subtype-WESLR projects each sample feature profile from each data type to a common latent subspace while maintaining the local structure of the original sample feature space and consistency with the weighted ensemble and optimizes the common subspace by an iterative method to identify cancer subtypes. We conduct experiments on various synthetic datasets and eight public multi-view datasets from The Cancer Genome Atlas. The results demonstrate that subtype-WESLR is better than competing methods by utilizing the integration of base clustering of exist methods for more precise subtypes.

Asunto(s)

Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética

17.

A comprehensive survey of the approaches for pathway analysis using multi-omics data integration.

Maghsoudi, Zeynab; Nguyen, Ha; Tavakkoli, Alireza; Nguyen, Tin.

Brief Bioinform ; 23(6)2022 11 19.

Artículo en Inglés | MEDLINE | ID: mdl-36252928

RESUMEN

Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.

Asunto(s)

Genómica , Proteómica , Genómica/métodos , Transcriptoma , Biomarcadores

18.

Identification of the hub genes in polycystic ovary syndrome based on disease-associated molecule network.

Wu, Yue; Yang, Lingping; Wu, Xianglu; Wang, Lidan; Qi, Hongbo; Feng, Qian; Peng, Bin; Ding, Yubin; Tang, Jing.

FASEB J ; 37(7): e23056, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37342921

RESUMEN

Revealing the key genes involved in polycystic ovary syndrome (PCOS) and elucidating its pathogenic mechanism is of extreme importance for the development of targeted clinical therapy for PCOS. Investigating disease by integrating several associated and interacting molecules in biological systems will make it possible to discover new pathogenic genes. In this study, an integrative disease-associated molecule network, combining protein-protein interactions and protein-metabolites interactions (PPMI) network was constructed based on the PCOS-associated genes and metabolites systematically collected. This new PPMI strategy identified several potential PCOS-associated genes, which have unreported in previous publications. Moreover, the systematic analysis of five benchmarks data sets indicated the DERL1 was identified as downregulated in PCOS granulosa cell and has good classification performance between PCOS patients and healthy controls. CCR2 and DVL3 were upregulated in PCOS adipose tissues and have good classification performance. The expression of novel gene FXR2 identified in this study is significantly increased in ovarian granulosa cells of PCOS patients compared with controls via quantitative analysis. Our study uncovers substantial differences in the PCOS-specific tissue and provides a plethora of information on dysregulated genes and metabolites that are linked to PCOS. This knowledgebase could have the potential to benefit the scientific and clinical community. In sum, the identification of novel gene associated with PCOS provides valuable insights into the underlying molecular mechanisms of PCOS and could potentially lead to the development of new diagnostic and therapeutic strategies.

Asunto(s)

Síndrome del Ovario Poliquístico , Femenino , Humanos , Síndrome del Ovario Poliquístico/metabolismo , Células de la Granulosa/metabolismo

19.

An integrated Bayesian framework for multi-omics prediction and classification.

Mallick, Himel; Porwal, Anupreet; Saha, Satabdi; Basak, Piyali; Svetnik, Vladimir; Paul, Erina.

Stat Med ; 43(5): 983-1002, 2024 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-38146838

RESUMEN

With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.

Asunto(s)

Multiómica , Programas Informáticos , Humanos , Teorema de Bayes , Estudios Transversales , Biomarcadores

20.

Interactive molecular causal networks of hypertension using a fast machine learning algorithm MRdualPC.

Kelly, Jack; Xu, Xiaoguang; Eales, James M; Keavney, Bernard; Berzuini, Carlo; Tomaszewski, Maciej; Guo, Hui.

BMC Med Res Methodol ; 24(1): 168, 2024 Aug 02.

Artículo en Inglés | MEDLINE | ID: mdl-39095705

RESUMEN

BACKGROUND: Understanding the complex interactions between genes and their causal effects on diseases is crucial for developing targeted treatments and gaining insight into biological mechanisms. However, the analysis of molecular networks, especially in the context of high-dimensional data, presents significant challenges. METHODS: This study introduces MRdualPC, a computationally tractable algorithm based on the MRPC approach, to infer large-scale causal molecular networks. We apply MRdualPC to investigate the upstream causal transcriptomics influencing hypertension using a comprehensive dataset of kidney genome and transcriptome data. RESULTS: Our algorithm proves to be 100 times faster than MRPC on average in identifying transcriptomics drivers of hypertension. Through clustering, we identify 63 modules with causal driver genes, including 17 modules with extensive causal networks. Notably, we find that genes within one of the causal networks are associated with the electron transport chain and oxidative phosphorylation, previously linked to hypertension. Moreover, the identified causal ancestor genes show an over-representation of blood pressure-related genes. CONCLUSIONS: MRdualPC has the potential for broader applications beyond gene expression data, including multi-omics integration. While there are limitations, such as the need for clustering in large gene expression datasets, our study represents a significant advancement in building causal molecular networks, offering researchers a valuable tool for analyzing big data and investigating complex diseases.

Asunto(s)

Algoritmos , Redes Reguladoras de Genes , Hipertensión , Aprendizaje Automático , Hipertensión/genética , Humanos , Transcriptoma/genética , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Análisis por Conglomerados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA