Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
IEEE J Transl Eng Health Med ; 12: 371-381, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38633564

RESUMEN

Brain state classification by applying deep learning techniques on neuroimaging data has become a recent topic of research. However, unlike domains where the data is low dimensional or there are large number of available training samples, neuroimaging data is high dimensional and has few training samples. To tackle these issues, we present a sparse feedforward deep neural architecture for encoding and decoding the structural connectome of the human brain. We use a sparsely connected element-wise multiplication as the first hidden layer and a fixed transform layer as the output layer. The number of trainable parameters and the training time is significantly reduced compared to feedforward networks. We demonstrate superior performance of this architecture in encoding the structural connectome implicated in Alzheimer's disease (AD) and Parkinson's disease (PD) from DTI brain scans. For decoding, we propose recursive feature elimination (RFE) algorithm based on DeepLIFT, layer-wise relevance propagation (LRP), and Integrated Gradients (IG) algorithms to remove irrelevant features and thereby identify key biomarkers associated with AD and PD. We show that the proposed architecture reduces 45.1% and 47.1% of the trainable parameters compared to a feedforward DNN with an increase in accuracy by 2.6 % and 3.1% for cognitively normal (CN) vs AD and CN vs PD classification, respectively. We also show that the proposed RFE method leads to a further increase in accuracy by 2.1% and 4% for CN vs AD and CN vs PD classification, while removing approximately 90% to 95% irrelevant features. Furthermore, we argue that the biomarkers (i.e., key brain regions and connections) identified are consistent with previous literature. We show that relevancy score-based methods can yield high discriminative power and are suitable for brain decoding. We also show that the proposed approach led to a reduction in the number of trainable network parameters, an increase in classification accuracy, and a detection of brain connections and regions that were consistent with earlier studies.


Asunto(s)
Enfermedad de Alzheimer , Conectoma , Humanos , Imagen por Resonancia Magnética/métodos , Conectoma/métodos , Redes Neurales de la Computación , Neuroimagen/métodos , Biomarcadores
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38581415

RESUMEN

Discovering hit molecules with desired biological activity in a directed manner is a promising but profound task in computer-aided drug discovery. Inspired by recent generative AI approaches, particularly Diffusion Models (DM), we propose Graph Latent Diffusion Model (GLDM)-a latent DM that preserves both the effectiveness of autoencoders of compressing complex chemical data and the DM's capabilities of generating novel molecules. Specifically, we first develop an autoencoder to encode the molecular data into low-dimensional latent representations and then train the DM on the latent space to generate molecules inducing targeted biological activity defined by gene expression profiles. Manipulating DM in the latent space rather than the input space avoids complicated operations to map molecule decomposition and reconstruction to diffusion processes, and thus improves training efficiency. Experiments show that GLDM not only achieves outstanding performances on molecular generation benchmarks, but also generates samples with optimal chemical properties and potentials to induce desired biological activity.


Asunto(s)
Benchmarking , Descubrimiento de Drogas , Difusión
3.
Heliyon ; 9(12): e22412, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38046150

RESUMEN

A supervised deep learning network like the UNet has performed well in segmenting brain anomalies such as lesions and tumours. However, such methods were proposed to perform on single-modality or multi-modality images. We use the Hybrid UNet Transformer (HUT) to improve performance in single-modality lesion segmentation and multi-modality brain tumour segmentation. The HUT consists of two pipelines running in parallel, one of which is UNet-based and the other is Transformer-based. The Transformer-based pipeline relies on feature maps in the intermediate layers of the UNet decoder during training. The HUT network takes in the available modalities of 3D brain volumes and embeds the brain volumes into voxel patches. The transformers in the system improve global attention and long-range correlation between the voxel patches. In addition, we introduce a self-supervised training approach in the HUT framework to enhance the overall segmentation performance. We demonstrate that HUT performs better than the state-of-the-art network SPiN in the single-modality segmentation on Anatomical Tracings of Lesions After Stroke (ATLAS) dataset by 4.84% of Dice score and a significant 41% in the Hausdorff Distance score. HUT also performed well on brain scans in the Brain Tumour Segmentation (BraTS20) dataset and demonstrated an improvement over the state-of-the-art network nnUnet by 0.96% in the Dice score and 4.1% in the Hausdorff Distance score.

4.
Front Neurosci ; 17: 1298514, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38105927

RESUMEN

A hybrid UNet and Transformer (HUT) network is introduced to combine the merits of the UNet and Transformer architectures, improving brain lesion segmentation from MRI and CT scans. The HUT overcomes the limitations of conventional approaches by utilizing two parallel stages: one based on UNet and the other on Transformers. The Transformer-based stage captures global dependencies and long-range correlations. It uses intermediate feature vectors from the UNet decoder and improves segmentation accuracy by enhancing the attention and relationship modeling between voxel patches derived from the 3D brain volumes. In addition, HUT incorporates self-supervised learning on the transformer network. This allows the transformer network to learn by maintaining consistency between the classification layers of the different resolutions of patches and augmentations. There is an improvement in the rate of convergence of the training and the overall capability of segmentation. Experimental results on benchmark datasets, including ATLAS and ISLES2018, demonstrate HUT's advantage over the state-of-the-art methods. HUT achieves higher Dice scores and reduced Hausdorff Distance scores in single-modality and multi-modality lesion segmentation. HUT outperforms the state-the-art network SPiN in the single-modality MRI segmentation on Anatomical Tracings of lesion After Stroke (ATLAS) dataset by 4.84% of Dice score and a large margin of 40.7% in the Hausdorff Distance score. HUT also performed well on CT perfusion brain scans in the Ischemic Stroke Lesion Segmentation (ISLES2018) dataset and demonstrated an improvement over the recent state-of-the-art network USSLNet by 3.3% in the Dice score and 12.5% in the Hausdorff Distance score. With the analysis of both single and multi-modalities datasets (ATLASR12 and ISLES2018), we show that HUT can perform and generalize well on different datasets. Code is available at: https://github.com/vicsohntu/HUT_CT.

5.
Sci Rep ; 13(1): 21047, 2023 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-38030699

RESUMEN

Schizophrenia is a highly heterogeneous disorder and salient functional connectivity (FC) features have been observed to vary across study sites, warranting the need for methods that can differentiate between site-invariant FC biomarkers and site-specific salient FC features. We propose a technique named Semi-supervised learning with data HaRmonisation via Encoder-Decoder-classifier (SHRED) to examine these features from resting state functional magnetic resonance imaging scans gathered from four sites. Our approach involves an encoder-decoder-classifier architecture that simultaneously performs data harmonisation and semi-supervised learning (SSL) to deal with site differences and labelling inconsistencies across sites respectively. The minimisation of reconstruction loss from SSL was shown to improve model performance even within small datasets whilst data harmonisation often led to lower model generalisability, which was unaffected using the SHRED technique. We show that our proposed model produces site-invariant biomarkers, most notably the connection between transverse temporal gyrus and paracentral lobule. Site-specific salient FC features were also elucidated, especially implicating the paracentral lobule for our local dataset. Our examination of these salient FC features demonstrates how site-specific features and site-invariant biomarkers can be differentiated, which can deepen our understanding of the neurobiology of schizophrenia.


Asunto(s)
Esquizofrenia , Humanos , Encéfalo/patología , Imagen por Resonancia Magnética/métodos , Lóbulo Frontal , Redes Neurales de la Computación , Mapeo Encefálico/métodos
6.
Comput Biol Med ; 164: 107328, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37573721

RESUMEN

In recent years, deep learning models have been applied to neuroimaging data for early diagnosis of Alzheimer's disease (AD). Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET) images provide structural and functional information about the brain, respectively. Combining these features leads to improved performance than using a single modality alone in building predictive models for AD diagnosis. However, current multi-modal approaches in deep learning, based on sMRI and PET, are mostly limited to convolutional neural networks, which do not facilitate integration of both image and phenotypic information of subjects. We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains. In this study, we demonstrate how brain networks are created from sMRI or PET images and can be used in a population graph framework that combines phenotypic information with imaging features of the brain networks. Then, we present a multi-modal GNN framework where each modality has its own branch of GNN and a technique that combines the multi-modal data at both the level of node vectors and adjacency matrices. Finally, we perform late fusion to combine the preliminary decisions made in each branch and produce a final prediction. As multi-modality data becomes available, multi-source and multi-modal is the trend of AD diagnosis. We conducted explorative experiments based on multi-modal imaging data combined with non-imaging phenotypic information for AD diagnosis and analyzed the impact of phenotypic information on diagnostic performance. Results from experiments demonstrated that our proposed multi-modal approach improves performance for AD diagnosis. Our study also provides technical reference and support the need for multivariate multi-modal diagnosis methods.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Redes Neurales de la Computación , Tomografía de Emisión de Positrones/métodos , Neuroimagen/métodos , Diagnóstico Precoz
7.
IEEE J Biomed Health Inform ; 27(9): 4591-4600, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37307177

RESUMEN

With the development of biotechnology, a large amount of multi-omics data have been collected for precision medicine. There exists multiple graph-based prior biological knowledge about omics data, such as gene-gene interaction networks. Recently, there has been an increasing interest in introducing graph neural networks (GNNs) into multi-omics learning. However, existing methods have not fully exploited these graphical priors since none have been able to integrate knowledge from multiple sources simultaneously. To solve this problem, we propose a multi-omics data analysis framework by incorporating multiple prior knowledge into graph neural network (MPK-GNN). To the best of our knowledge, this is the first attempt to introduce multiple prior graphs into multi-omics data analysis. Specifically, the proposed method contains four parts: (1) a feature-level learning module to aggregate information from prior graphs; (2) a projection module to maximize the agreement among prior networks by optimizing a contrastive loss; (3) a sample-level module to learn a global representation from input multi-omics features; (4) a task-specific module to flexibly extend MPK-GNN for various downstream multi-omics analysis tasks. Finally, we verify the effectiveness of the proposed multi-omics learning algorithm on the cancer molecular subtype classification task. Experimental results show that MPK-GNN outperforms other state-of-the-art algorithms, including multi-view learning methods and multi-omics integrative approaches.


Asunto(s)
Multiómica , Redes Neurales de la Computación , Humanos , Algoritmos , Biotecnología , Análisis de Datos
8.
BMC Bioinformatics ; 22(Suppl 10): 632, 2022 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-36443676

RESUMEN

BACKGROUND: Cancers are genetically heterogeneous, so anticancer drugs show varying degrees of effectiveness on patients due to their differing genetic profiles. Knowing patient's responses to numerous cancer drugs are needed for personalized treatment for cancer. By using molecular profiles of cancer cell lines available from Cancer Cell Line Encyclopedia (CCLE) and anticancer drug responses available in the Genomics of Drug Sensitivity in Cancer (GDSC), we will build computational models to predict anticancer drug responses from molecular features. RESULTS: We propose a novel deep neural network model that integrates multi-omics data available as gene expressions, copy number variations, gene mutations, reverse phase protein array expressions, and metabolomics expressions, in order to predict cellular responses to known anti-cancer drugs. We employ a novel graph embedding layer that incorporates interactome data as prior information for prediction. Moreover, we propose a novel attention layer that effectively combines different omics features, taking their interactions into account. The network outperformed feedforward neural networks and reported 0.90 for [Formula: see text] values for prediction of drug responses from cancer cell lines data available in CCLE and GDSC. CONCLUSION: The outstanding results of our experiments demonstrate that the proposed method is capable of capturing the interactions of genes and proteins, and integrating multi-omics features effectively. Furthermore, both the results of ablation studies and the investigations of the attention layer imply that gene mutation has a greater influence on the prediction of drug responses than other omics data types. Therefore, we conclude that our approach can not only predict the anti-cancer drug response precisely but also provides insights into reaction mechanisms of cancer cell lines and drugs as well.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Variaciones en el Número de Copia de ADN , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Mutación , Genómica
9.
Sci Rep ; 12(1): 15425, 2022 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-36104347

RESUMEN

Multi-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.


Asunto(s)
Neuroblastoma , Algoritmos , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Neuroblastoma/genética , Pronóstico
10.
Bioinformatics ; 38(Suppl_2): ii113-ii119, 2022 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-36124784

RESUMEN

MOTIVATION: While it has been well established that drugs affect and help patients differently, personalized drug response predictions remain challenging. Solutions based on single omics measurements have been proposed, and networks provide means to incorporate molecular interactions into reasoning. However, how to integrate the wealth of information contained in multiple omics layers still poses a complex problem. RESULTS: We present DrDimont, Drug response prediction from Differential analysis of multi-omics networks. It allows for comparative conclusions between two conditions and translates them into differential drug response predictions. DrDimont focuses on molecular interactions. It establishes condition-specific networks from correlation within an omics layer that are then reduced and combined into heterogeneous, multi-omics molecular networks. A novel semi-local, path-based integration step ensures integrative conclusions. Differential predictions are derived from comparing the condition-specific integrated networks. DrDimont's predictions are explainable, i.e. molecular differences that are the source of high differential drug scores can be retrieved. We predict differential drug response in breast cancer using transcriptomics, proteomics, phosphosite and metabolomics measurements and contrast estrogen receptor positive and receptor negative patients. DrDimont performs better than drug prediction based on differential protein expression or PageRank when evaluating it on ground truth data from cancer cell lines. We find proteomic and phosphosite layers to carry most information for distinguishing drug response. AVAILABILITY AND IMPLEMENTATION: DrDimont is available on CRAN: https://cran.r-project.org/package=DrDimont. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama , Programas Informáticos , Neoplasias de la Mama/tratamiento farmacológico , Femenino , Humanos , Proteómica , Receptores de Estrógenos , Transcriptoma
11.
Front Neurosci ; 16: 866666, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35677355

RESUMEN

Both neuroimaging and genomics datasets are often gathered for the detection of neurodegenerative diseases. Huge dimensionalities of neuroimaging data as well as omics data pose tremendous challenge for methods integrating multiple modalities. There are few existing solutions that can combine both multi-modal imaging and multi-omics datasets to derive neurological insights. We propose a deep neural network architecture that combines both structural and functional connectome data with multi-omics data for disease classification. A graph convolution layer is used to model functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) data simultaneously to learn compact representations of the connectome. A separate set of graph convolution layers are then used to model multi-omics datasets, expressed in the form of population graphs, and combine them with latent representations of the connectome. An attention mechanism is used to fuse these outputs and provide insights on which omics data contributed most to the model's classification decision. We demonstrate our methods for Parkinson's disease (PD) classification by using datasets from the Parkinson's Progression Markers Initiative (PPMI). PD has been shown to be associated with changes in the human connectome and it is also known to be influenced by genetic factors. We combine DTI and fMRI data with multi-omics data from RNA Expression, Single Nucleotide Polymorphism (SNP), DNA Methylation and non-coding RNA experiments. A Matthew Correlation Coefficient of greater than 0.8 over many combinations of multi-modal imaging data and multi-omics data was achieved with our proposed architecture. To address the paucity of paired multi-modal imaging data and the problem of imbalanced data in the PPMI dataset, we compared the use of oversampling against using CycleGAN on structural and functional connectomes to generate missing imaging modalities. Furthermore, we performed ablation studies that offer insights into the importance of each imaging and omics modality for the prediction of PD. Analysis of the generated attention matrices revealed that DNA Methylation and SNP data were the most important omics modalities out of all the omics datasets considered. Our work motivates further research into imaging genetics and the creation of more multi-modal imaging and multi-omics datasets to study PD and other complex neurodegenerative diseases.

12.
Hum Brain Mapp ; 43(9): 2801-2816, 2022 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-35224817

RESUMEN

Functional magnetic resonance imaging (fMRI) is used to capture complex and dynamic interactions between brain regions while performing tasks. Task related alterations in the brain have been classified as task specific and task general, depending on whether they are particular to a task or common across multiple tasks. Using recent attempts in interpreting deep learning models, we propose an approach to determine both task specific and task general architectures of the functional brain. We demonstrate our methods with a reference-based decoder on deep learning classifiers trained on 12,500 rest and task fMRI samples from the Human Connectome Project (HCP). The decoded task general and task specific motor and language architectures were validated with findings from previous studies. We found that unlike intersubject variability that is characteristic of functional pathology of neurological diseases, a small set of connections are sufficient to delineate the rest and task states. The nodes and connections in the task general architecture could serve as potential disease biomarkers as alterations in task general brain modulations are known to be implicated in several neuropsychiatric disorders.


Asunto(s)
Conectoma , Encéfalo/diagnóstico por imagen , Conectoma/métodos , Humanos , Lenguaje , Imagen por Resonancia Magnética/métodos , Red Nerviosa , Descanso
13.
BMC Bioinformatics ; 21(Suppl 16): 560, 2020 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-33323115

RESUMEN

BACKGROUND: Protein-protein interaction (PPI) prediction is an important task towards the understanding of many bioinformatics functions and applications, such as predicting protein functions, gene-disease associations and disease-drug associations. However, many previous PPI prediction researches do not consider missing and spurious interactions inherent in PPI networks. To address these two issues, we define two corresponding tasks, namely missing PPI prediction and spurious PPI prediction, and propose a method that employs graph embeddings that learn vector representations from constructed Gene Ontology Annotation (GOA) graphs and then use embedded vectors to achieve the two tasks. Our method leverages on information from both term-term relations among GO terms and term-protein annotations between GO terms and proteins, and preserves properties of both local and global structural information of the GO annotation graph. RESULTS: We compare our method with those methods that are based on information content (IC) and one method that is based on word embeddings, with experiments on three PPI datasets from STRING database. Experimental results demonstrate that our method is more effective than those compared methods. CONCLUSION: Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GOA graphs for our defined missing and spurious PPI tasks.


Asunto(s)
Ontología de Genes , Anotación de Secuencia Molecular , Mapeo de Interacción de Proteínas/métodos , Animales , Área Bajo la Curva , Biología Computacional/métodos , Humanos , Ratones , Curva ROC , Saccharomyces cerevisiae/genética , Análisis y Desempeño de Tareas
14.
Sci Rep ; 10(1): 7590, 2020 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32371990

RESUMEN

Specialized processing in the brain is performed by multiple groups of brain regions organized as functional modules. Although, in vivo studies of brain functional modules involve multiple functional Magnetic Resonance Imaging (fMRI) scans, the methods used to derive functional modules from functional networks of the brain ignore individual differences in the functional architecture and use incomplete functional connectivity information. To correct this, we propose an Iterative Consensus Spectral Clustering (ICSC) algorithm that detects the most representative modules from individual dense weighted connectivity matrices derived from multiple scans. The ICSC algorithm derives group-level modules from modules of multiple individuals by iteratively minimizing the consensus-cost between the two. We demonstrate that the ICSC algorithm can be used to derive biologically plausible group-level (for multiple subjects) and subject-level (for multiple subject scans) brain modules, using resting-state fMRI scans of 589 subjects from the Human Connectome Project. We employed a multipronged strategy to show the validity of the modularizations obtained from the ICSC algorithm. We show a heterogeneous variability in the modular structure across subjects where modules involved in visual and motor processing were highly stable across subjects. Conversely, we found a lower variability across scans of the same subject. The performance of our algorithm was compared with existing functional brain modularization methods and we show that our method detects group-level modules that are more representative of the modules of multiple individuals. Finally, the experiments on synthetic images quantitatively demonstrate that the ICSC algorithm detects group-level and subject-level modules accurately under varied conditions. Therefore, besides identifying functional modules for a population of subjects, the proposed method can be used for applications in personalized neuroscience. The ICSC implementation is available at https://github.com/SCSE-Biomedical-Computing-Group/ICSC.

15.
Neuroimage Clin ; 25: 102186, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32000101

RESUMEN

Functional modules in the human brain support its drive for specialization whereas brain hubs act as focal points for information integration. Brain hubs are brain regions that have a large number of both within and between module connections. We argue that weak connections in brain functional networks lead to misclassification of brain regions as hubs. In order to resolve this, we propose a new measure called ambivert degree that considers the node's degree as well as connection weights in order to identify nodes with both high degree and high connection weights as hubs. Using resting-state functional MRI scans from the Human Connectome Project, we show that ambivert degree identifies brain hubs that are not only crucial but also invariable across subjects. We hypothesize that nodal measures based on ambivert degree can be effectively used to classify patients from healthy controls for diseases that are known to have widespread hub disruption. Using patient data for Alzheimer's Disease and Autism Spectrum Disorder, we show that the hubs in the patient and healthy groups are very different for both the diseases and deep feedforward neural networks trained on nodal hub features lead to a significantly higher classification accuracy with significantly fewer trainable weights compared to using functional connectivity features. Thus, the ambivert degree improves identification of crucial brain hubs in healthy subjects and can be used as a diagnostic feature to detect neurological diseases characterized by hub disruption.


Asunto(s)
Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/fisiopatología , Trastorno del Espectro Autista/diagnóstico por imagen , Trastorno del Espectro Autista/fisiopatología , Corteza Cerebral/diagnóstico por imagen , Conectoma/métodos , Aprendizaje Profundo , Red Nerviosa/diagnóstico por imagen , Adolescente , Adulto , Anciano , Corteza Cerebral/fisiopatología , Niño , Humanos , Imagen por Resonancia Magnética , Red Nerviosa/fisiopatología , Adulto Joven
16.
BMC Genomics ; 20(Suppl 9): 918, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31874639

RESUMEN

BACKGROUND: Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. RESULTS: We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. CONCLUSION: Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.


Asunto(s)
Ontología de Genes , Mapeo de Interacción de Proteínas/métodos , Humanos , Proteínas de Saccharomyces cerevisiae/metabolismo
17.
BMC Genomics ; 20(Suppl 9): 901, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31874644

RESUMEN

BACKGROUND: Module detection algorithms relying on modularity maximization suffer from an inherent resolution limit that hinders detection of small topological modules, especially in molecular networks where most biological processes are believed to form small and compact communities. We propose a novel modular refinement approach that helps finding functionally significant modules of molecular networks. RESULTS: The module refinement algorithm improves the quality of topological modules in protein-protein interaction networks by finding biologically functionally significant modules. The algorithm is based on the fact that functional modules in biology do not necessarily represent those corresponding to maximum modularity. Larger modules corresponding to maximal modularity are incrementally re-modularized again under specific constraints so that smaller yet topologically and biologically valid modules are recovered. We show improvement in quality and functional coverage of modules using experiments on synthetic and real protein-protein interaction networks. We also compare our results with six existing methods available for clustering biological networks. CONCLUSION: The proposed algorithm finds smaller but functionally relevant modules that are undetected by classical quality maximization approaches for modular detection. The refinement procedure helps to detect more functionally enriched modules in protein-protein interaction networks, which are also more coherent with functionally characterised gene sets.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas/métodos , Análisis por Conglomerados , Humanos
18.
BMC Med Genomics ; 12(Suppl 8): 178, 2019 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-31856829

RESUMEN

BACKGROUND: The availability of high-throughput omics datasets from large patient cohorts has allowed the development of methods that aim at predicting patient clinical outcomes, such as survival and disease recurrence. Such methods are also important to better understand the biological mechanisms underlying disease etiology and development, as well as treatment responses. Recently, different predictive models, relying on distinct algorithms (including Support Vector Machines and Random Forests) have been investigated. In this context, deep learning strategies are of special interest due to their demonstrated superior performance over a wide range of problems and datasets. One of the main challenges of such strategies is the "small n large p" problem. Indeed, omics datasets typically consist of small numbers of samples and large numbers of features relative to typical deep learning datasets. Neural networks usually tackle this problem through feature selection or by including additional constraints during the learning process. METHODS: We propose to tackle this problem with a novel strategy that relies on a graph-based method for feature extraction, coupled with a deep neural network for clinical outcome prediction. The omics data are first represented as graphs whose nodes represent patients, and edges represent correlations between the patients' omics profiles. Topological features, such as centralities, are then extracted from these graphs for every node. Lastly, these features are used as input to train and test various classifiers. RESULTS: We apply this strategy to four neuroblastoma datasets and observe that models based on neural networks are more accurate than state of the art models (DNN: 85%-87%, SVM/RF: 75%-82%). We explore how different parameters and configurations are selected in order to overcome the effects of the small data problem as well as the curse of dimensionality. CONCLUSIONS: Our results indicate that the deep neural networks capture complex features in the data that help predicting patient clinical outcomes.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Neuroblastoma/diagnóstico , Perfilación de la Expresión Génica , Humanos , Neuroblastoma/genética , Pronóstico
19.
F1000Res ; 8: 465, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31559017

RESUMEN

Background: Biological entities such as genes, promoters, mRNA, metabolites or proteins do not act alone, but in concert in their network context. Modules, i.e., groups of nodes with similar topological properties in these networks characterize important biological functions of the underlying biomolecular system. Edges in such molecular networks represent regulatory and physical interactions, and comparing them between conditions provides valuable information on differential molecular mechanisms. However, biological data is inherently noisy and network reduction techniques can propagate errors particularly to the level of edges. We aim to improve the analysis of networks of biological molecules by deriving modules together with edge relevance estimations that are based on global network characteristics.  Methods: We propose to fit the networks to stochastic block models (SBM), a method that has not yet been investigated for the analysis of biomolecular networks. This procedure both delivers modules of the networks and enables the derivation of edge confidence scores. We apply it to correlation-based networks of breast cancer data originating from high-throughput measurements of diverse molecular layers such as transcriptomics, proteomics, and metabolomics. The networks were reduced by thresholding for correlation significance or by requirements on scale-freeness.  Results and discussion: We find that the networks are best represented by the hierarchical version of the SBM, and many of the predicted blocks have a biological meaning according to functional annotation. The edge confidence scores are overall in concordance with the biological evidence given by the measurements. As they are based on global network connectivity characteristics and potential hierarchies within the biomolecular networks are taken into account, they could be used as additional, integrated features in network-based data comparisons. Their tight relationship to edge existence probabilities can be exploited to predict missing or spurious edges in order to improve the network representation of the underlying biological system.


Asunto(s)
Biología Computacional , Proteómica , Metabolómica , Proteínas
20.
BMC Syst Biol ; 13(Suppl 2): 37, 2019 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-30953534

RESUMEN

BACKGROUND: Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN. RESULTS: We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN. CONCLUSION: The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Mapas de Interacción de Proteínas , Biología de Sistemas/métodos , Teorema de Bayes , Distribución Normal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA