Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 108
Filtrar
1.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37598424

RESUMO

Molecular property prediction (MPP) is a crucial and fundamental task for AI-aided drug discovery (AIDD). Recent studies have shown great promise of applying self-supervised learning (SSL) to producing molecular representations to cope with the widely-concerned data scarcity problem in AIDD. As some specific substructures of molecules play important roles in determining molecular properties, molecular representations learned by deep learning models are expected to attach more importance to such substructures implicitly or explicitly to achieve better predictive performance. However, few SSL pre-trained models for MPP in the literature have ever focused on such substructures. To challenge this situation, this paper presents a Chemistry-Aware Fragmentation for Effective MPP (CAFE-MPP in short) under the self-supervised contrastive learning framework. First, a novel fragment-based molecular graph (FMG) is designed to represent the topological relationship between chemistry-aware substructures that constitute a molecule. Then, with well-designed hard negative pairs, a is pre-trained on fragment-level by contrastive learning to extract representations for the nodes in FMGs. Finally, a Graphormer model is leveraged to produce molecular representations for MPP based on the embeddings of fragments. Experiments on 11 benchmark datasets show that the proposed CAFE-MPP method achieves state-of-the-art performance on 7 of the 11 datasets and the second-best performance on 3 datasets, compared with six remarkable self-supervised methods. Further investigations also demonstrate that CAFE-MPP can learn to embed molecules into representations implicitly containing the information of fragments highly correlated to molecular properties, and can alleviate the over-smoothing problem of graph neural networks.


Assuntos
Benchmarking , Descoberta de Drogas , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado
2.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38710497

RESUMO

MOTIVATION: Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have achieved considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. RESULTS: For this sake, in this article we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecule similarity graph (MSG). Following that, we conduct GSL on the MSG, i.e. molecule-level GSL, to get the final molecular embeddings, which are the results of fuzing both GNN encoded molecular representations and the relationships among molecules. That is, combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on 10 various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/zby961104/GSL-MPP.


Assuntos
Redes Neurais de Computação , Descoberta de Drogas/métodos , Aprendizado de Máquina , Algoritmos
3.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35172334

RESUMO

Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.


Assuntos
Redes Neurais de Computação , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
4.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37079731

RESUMO

MOTIVATION: Predicting molecular properties is one of the fundamental problems in drug design and discovery. In recent years, self-supervised learning (SSL) has shown its promising performance in image recognition, natural language processing, and single-cell data analysis. Contrastive learning (CL) is a typical SSL method used to learn the features of data so that the trained model can more effectively distinguish the data. One important issue of CL is how to select positive samples for each training example, which will significantly impact the performance of CL. RESULTS: In this article, we propose a new method for molecular property prediction (MPP) by Contrastive Learning with Attention-guided Positive-sample Selection (CLAPS). First, we generate positive samples for each training example based on an attention-guided selection scheme. Second, we employ a Transformer encoder to extract latent feature vectors and compute the contrastive loss aiming to distinguish positive and negative sample pairs. Finally, we use the trained encoder for predicting molecular properties. Experiments on various benchmark datasets show that our approach outperforms the state-of-the-art (SOTA) methods in most cases. AVAILABILITY AND IMPLEMENTATION: The code is publicly available at https://github.com/wangjx22/CLAPS.


Assuntos
Benchmarking , Projetos de Pesquisa , Desenho de Fármacos , Processamento de Linguagem Natural , Análise de Célula Única
5.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37531266

RESUMO

MOTIVATION: Protein complexes are groups of polypeptide chains linked by non-covalent protein-protein interactions, which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks by mining dense subnetworks or subgraphs. RESULTS: In this article, different from the existing works, we propose a novel approach for this task based on generative adversarial networks, which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a protein interaction network. To effectively support model training and testing, we construct two more comprehensive and reliable protein interaction networks and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes. AVAILABILITY AND IMPLEMENTATION: https://github.com/yul-pan/PCGAN.


Assuntos
Mapas de Interação de Proteínas , Saccharomyces cerevisiae , Humanos , Saccharomyces cerevisiae/metabolismo , Transdução de Sinais , Biossíntese de Proteínas
6.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37505457

RESUMO

MOTIVATION: Contrastive learning has been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, existing methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance. RESULTS: To address this problem, in this article, we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models. AVAILABILITY AND IMPLEMENTATION: The code is publicly available at https://github.com/ZiqiaoZhang/FraSICL.


Assuntos
Benchmarking , Semântica , Modelos Moleculares
7.
J Chem Inf Model ; 64(7): 2921-2930, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38145387

RESUMO

Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.


Assuntos
Fármacos Anti-HIV , Descoberta de Drogas , Hidrolases , Aprendizagem , Relação Quantitativa Estrutura-Atividade
8.
BMC Genomics ; 23(Suppl 6): 864, 2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-37946133

RESUMO

BACKGROUND: The rapid devolvement of single cell RNA sequencing (scRNA-seq) technology leads to huge amounts of scRNA-seq data, which greatly advance the research of many biomedical fields involving tissue heterogeneity, pathogenesis of disease and drug resistance etc. One major task in scRNA-seq data analysis is to cluster cells in terms of their expression characteristics. Up to now, a number of methods have been proposed to infer cell clusters, yet there is still much space to improve their performance. RESULTS: In this paper, we develop a new two-step clustering approach to effectively cluster scRNA-seq data, which is called TSC - the abbreviation of Two-Step Clustering. Particularly, by dividing all cells into two types: core cells (those possibly lying around the centers of clusters) and non-core cells (those locating in the boundary areas of clusters), we first clusters the core cells by hierarchical clustering (the first step) and then assigns the non-core cells to the corresponding nearest clusters (the second step). Extensive experiments on 12 real scRNA-seq datasets show that TSC outperforms the state of the art methods. CONCLUSION: TSC is an effective clustering method due to its two-steps clustering strategy, and it is a useful tool for scRNA-seq data analysis.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Análise de Dados , Algoritmos
9.
Bioinformatics ; 38(14): 3582-3589, 2022 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-35652721

RESUMO

MOTIVATION: Accurately predicting drug-target interaction (DTI) is a crucial step to drug discovery. Recently, deep learning techniques have been widely used for DTI prediction and achieved significant performance improvement. One challenge in building deep learning models for DTI prediction is how to appropriately represent drugs and targets. Target distance map and molecular graph are low dimensional and informative representations, which however have not been jointly used in DTI prediction. Another challenge is how to effectively model the mutual impact between drugs and targets. Though attention mechanism has been used to capture the one-way impact of targets on drugs or vice versa, the mutual impact between drugs and targets has not yet been explored, which is very important in predicting their interactions. RESULTS: Therefore, in this article we propose MINN-DTI, a new model for DTI prediction. MINN-DTI combines an interacting-transformer module (called Interformer) with an improved Communicative Message Passing Neural Network (CMPNN) (called Inter-CMPNN) to better capture the two-way impact between drugs and targets, which are represented by molecular graph and distance map, respectively. The proposed method obtains better performance than the state-of-the-art methods on three benchmark datasets: DUD-E, human and BindingDB. MINN-DTI also provides good interpretability by assigning larger weights to the amino acids and atoms that contribute more to the interactions between drugs and targets. AVAILABILITY AND IMPLEMENTATION: The data and code of this study are available at https://github.com/admislf/MINN-DTI.


Assuntos
Redes Neurais de Computação , Proteínas , Humanos , Proteínas/química , Simulação por Computador , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos
10.
BMC Bioinformatics ; 23(Suppl 8): 339, 2022 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-35974329

RESUMO

BACKGROUND: Essential proteins are indispensable to the development and survival of cells. The identification of essential proteins not only is helpful for the understanding of the minimal requirements for cell survival, but also has practical significance in disease diagnosis, drug design and medical treatment. With the rapidly amassing of protein-protein interaction (PPI) data, computationally identifying essential proteins from protein-protein interaction networks (PINs) becomes more and more popular. Up to now, a number of various approaches for essential protein identification based on PINs have been developed. RESULTS: In this paper, we propose a new and effective approach called iMEPP to identify essential proteins from PINs by fusing multiple types of biological data and applying the influence maximization mechanism to the PINs. Concretely, we first integrate PPI data, gene expression data and Gene Ontology to construct weighted PINs, to alleviate the impact of high false-positives in the raw PPI data. Then, we define the influence scores of nodes in PINs with both orthological data and PIN topological information. Finally, we develop an influence discount algorithm to identify essential proteins based on the influence maximization mechanism. CONCLUSIONS: We applied our method to identifying essential proteins from saccharomyces cerevisiae PIN. Experiments show that our iMEPP method outperforms the existing methods, which validates its effectiveness and advantage.


Assuntos
Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae , Algoritmos , Biologia Computacional/métodos , Ontologia Genética , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
11.
Bioinformatics ; 37(18): 2981-2987, 2021 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-33769437

RESUMO

MOTIVATION: Molecular property prediction is a hot topic in recent years. Existing graph-based models ignore the hierarchical structures of molecules. According to the knowledge of chemistry and pharmacy, the functional groups of molecules are closely related to its physio-chemical properties and binding affinities. So, it should be helpful to represent molecular graphs by fragments that contain functional groups for molecular property prediction. RESULTS: In this article, to boost the performance of molecule property prediction, we first propose a definition of molecule graph fragments that may be or contain functional groups, which are relevant to molecular properties, then develop a fragment-oriented multi-scale graph attention network for molecular property prediction, which is called FraGAT. Experiments on several widely used benchmarks are conducted to evaluate FraGAT. Experimental results show that FraGAT achieves state-of-the-art predictive performance in most cases. Furthermore, our case studies show that when the fragments used to represent the molecule graphs contain functional groups, the model can make better predictions. This conforms to our expectation and demonstrates the interpretability of the proposed model. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this work are available in GitHub, at https://github.com/ZiqiaoZhang/FraGAT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
BMC Bioinformatics ; 22(Suppl 6): 130, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078287

RESUMO

BACKGROUND: The rapid development of single-cell RNA sequencing (scRNA-seq) enables the exploration of cell heterogeneity, which is usually done by scRNA-seq data clustering. The essence of scRNA-seq data clustering is to group cells by measuring the similarities among genes/transcripts of cells. And the selection of features for cell similarity evaluation is of great importance, which will significantly impact clustering effectiveness and efficiency. RESULTS: In this paper, we propose a novel method called CaFew to select genes based on cluster-aware feature weighting. By optimizing the clustering objective function, CaFew obtains a feature weight matrix, which is further used for feature selection. The genes have large weights in at least one cluster or the genes whose weights vary greatly in different clusters are selected. Experiments on 8 real scRNA-seq datasets show that CaFew can obviously improve the clustering performance of existing scRNA-seq data clustering methods. Particularly, the combination of CaFew with SC3 achieves the state-of-art performance. Furthermore, CaFew also benefits the visualization of scRNA-seq data. CONCLUSION: CaFew is an effective scRNA-seq data clustering method due to its gene selection mechanism based on cluster-aware feature weighting, and it is a useful tool for scRNA-seq data analysis.


Assuntos
RNA Citoplasmático Pequeno , Análise de Célula Única , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise de Sequência de RNA
13.
J Proteome Res ; 20(1): 1079-1086, 2021 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-33338382

RESUMO

Batch effects are unwanted data variations that may obscure biological signals, leading to bias or errors in subsequent data analyses. Effective evaluation and elimination of batch effects are necessary for omics data analysis. In order to facilitate the evaluation and correction of batch effects, here we present BatchSever, an open-source R/Shiny based user-friendly interactive graphical web platform for batch effects analysis. In BatchServer, we introduced autoComBat, a modified version of ComBat, which is the most widely adopted tool for batch effect correction. BatchServer uses PVCA (Principal Variance Component Analysis) and UMAP (Manifold Approximation and Projection) for evaluation and visualization of batch effects. We demonstrate its applications in multiple proteomics and transcriptomic data sets. BatchServer is provided at https://lifeinfor.shinyapps.io/batchserver/ as a web server. The source codes are freely available at https://github.com/guomics-lab/batch_server.


Assuntos
Biologia Computacional , Software
14.
Methods ; 179: 55-64, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32446957

RESUMO

At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation method and developed a corresponding deep learning-based framework called TOP (the abbreviation of TOxicity Prediction). TOP integrates specifically designed data preprocessing methods, an RNN based on bidirectional gated recurrent unit (BiGRU), and fully connected neural networks for end-to-end molecular representation learning and chemical toxicity prediction. TOP can automatically learn a mixed molecular representation from not only SMILES contextual information that describes the molecule structure, but also physiochemical properties. Therefore, TOP can overcome the drawbacks of existing methods that use either of them, thus greatly promotes toxicity prediction accuracy. We conducted extensive experiments over 14 classic toxicity prediction tasks on three different benchmark datasets, including balanced and imbalanced ones. The results show that, with the help of the novel molecular representation method, TOP significantly outperforms not only three baseline machine learning methods, but also five state-of-the-art methods.


Assuntos
Quimioinformática/métodos , Aprendizado Profundo , Descoberta de Drogas/métodos , Farmacologia Clínica/métodos , Testes de Toxicidade/métodos , Conjuntos de Dados como Assunto , Descoberta de Drogas/estatística & dados numéricos , Previsões/métodos , Humanos , Farmacologia Clínica/estatística & dados numéricos , Testes de Toxicidade/estatística & dados numéricos
15.
BMC Bioinformatics ; 21(Suppl 13): 384, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938375

RESUMO

BACKGROUND: Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. RESULTS: Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. CONCLUSIONS: PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .


Assuntos
Algoritmos , Proteínas de Ligação a DNA/genética , Bases de Dados de Proteínas/normas , Humanos , Modelos Moleculares
16.
J Chem Inf Model ; 60(4): 2367-2376, 2020 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-32118415

RESUMO

Drug research and development is a time-consuming and high-cost task, pressing an urgent demand to identify novel indications of approved drugs, referred to as drug repositioning, which provides an economical and efficient way for drug discovery. With increasing volumes of large-scale chemical, genomic, and pharmacological data sets generated by the high-throughput technique, it is crucial to develop systematic and rational computational approaches to identify new indications of approved drugs. In this paper, we introduce HNet-DNN, which utilizes a deep neural network (DNN), to predict new drug-disease associations based on the features extracted from the drug-disease heterogeneous network. Instead of the straightforward concatenation of chemical and phenotypic features as the input of DNN, we used these raw features of drugs and diseases to construct a drug-drug similarity network and a disease-disease similarity network, and then built a drug-disease heterogeneous network by integrating known drug-disease associations. Subsequently, we extracted topological features for drug-disease associations from the heterogeneous network and used them to train a DNN model. Our intensive performance evaluations demonstrated that HNet-DNN effectively exploits the features of the heterogeneous network to boost the predictive performance of drug-disease associations. Compared with a couple of typical classifiers and competitive approaches, our method not only achieved state-of-the-art performance but also effectively alleviated the overfitting problem. Moreover, we ran HNet-DNN to predict new drug-disease associations and carried out case studies to verify the effectiveness of our method.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Preparações Farmacêuticas , Descoberta de Drogas , Reposicionamento de Medicamentos
17.
BMC Bioinformatics ; 20(Suppl 15): 598, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874597

RESUMO

BACKGROUND: Super-enhancers (SEs) are clusters of transcriptional active enhancers, which dictate the expression of genes defining cell identity and play an important role in the development and progression of tumors and other diseases. Many key cancer oncogenes are driven by super-enhancers, and the mutations associated with common diseases such as Alzheimer's disease are significantly enriched with super-enhancers. Super-enhancers have shown great potential for the identification of key oncogenes and the discovery of disease-associated mutational sites. RESULTS: In this paper, we propose a new computational method called DEEPSEN for predicting super-enhancers based on convolutional neural network. The proposed method integrates 36 kinds of features. Compared with existing approaches, our method performs better and can be used for genome-wide prediction of super-enhancers. Besides, we screen important features for predicting super-enhancers. CONCLUSION: Convolutional neural network is effective in boosting the performance of super-enhancer prediction.


Assuntos
Redes Neurais de Computação , Humanos , Neoplasias/genética , Oncogenes
18.
BMC Genomics ; 20(Suppl 2): 221, 2019 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-30967107

RESUMO

BACKGROUND: Epigenome is highly dynamic during the early stages of embryonic development. Epigenetic modifications provide the necessary regulation for lineage specification and enable the maintenance of cellular identity. Given the rapid accumulation of genome-wide epigenomic modification maps across cellular differentiation process, there is an urgent need to characterize epigenetic dynamics and reveal their impacts on differential gene regulation. METHODS: We proposed DiffEM, a computational method for differential analysis of epigenetic modifications and identified highly dynamic modification sites along cellular differentiation process. We applied this approach to investigating 6 epigenetic marks of 20 kinds of human early developmental stages and tissues, including hESCs, 4 hESC-derived lineages and 15 human primary tissues. RESULTS: We identified highly dynamic modification sites where different cell types exhibit distinctive modification patterns, and found that these highly dynamic sites enriched in the genes related to cellular development and differentiation. Further, to evaluate the effectiveness of our method, we correlated the dynamics scores of epigenetic modifications with the variance of gene expression, and compared the results of our method with those of the existing algorithms. The comparison results demonstrate the power of our method in evaluating the epigenetic dynamics and identifying highly dynamic regions along cell differentiation process.


Assuntos
Linhagem da Célula , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Epigenômica , Regulação da Expressão Gênica no Desenvolvimento , Genoma Humano , Diferenciação Celular , Histonas/genética , Histonas/metabolismo , Humanos , Especificidade de Órgãos
19.
BMC Bioinformatics ; 19(Suppl 19): 523, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598074

RESUMO

BACKGROUND: The default mode network (DMN) in resting state has been increasingly used in disease diagnosis since it was found in 2001. Prior work has mainly focused on extracting a single DMN with various techniques. However, by using seeding-based analysis with more than one desirable seed, we can obtain multiple DMNs, which are likely to have complementary information, and thus are more promising for disease diagnosis. In the study, we used 18 early mild cognitive impairment (EMCI) participants and 18 late mild cognitive impairment (LMCI) participants of Alzheimer's disease (AD). First, we used seeding-based analysis with four seeds to extract four DMNs for each subject. Then, we conducted fusion analysis for all different combinations of the four DMNs. Finally, we carried out nonlinear support vector machine classification based on the mixing coefficients from the fusion analysis. RESULTS: We found that (1) the four DMNs corresponding to the four different seeds indeed capture different functional regions of each subject; (2) Maps of the four DMNs in the most different joint source from fusion analysis are centered at the regions of the corresponding seeds; (3) Classification results reveal the effectiveness of using multiple seeds to extract DMNs. When using a single seed, the regions of posterior cingulate cortex (PCC) extractions of EMCI and LMCI show the largest difference. For multiple-seed cases, the regions of PCC extraction and right lateral parietal cortex (RLP) extraction provide complementary information for each other in fusion, which improves the classification accuracy. Furthermore, the regions of left lateral parietal cortex (LLP) extraction and RLP extraction also have complementary effect in fusion. In summary, AD diagnosis can be improved by exploiting complementary information of DMNs extracted with multiple seeds. CONCLUSIONS: In this study, we applied fusion analysis to the DMNs extracted by using different seeds for exploiting the complementary information hidden among the separately extracted DMNs, and the results supported our expectation that using the complementary information can improve classification accuracy.


Assuntos
Doença de Alzheimer/classificação , Doença de Alzheimer/diagnóstico , Mapeamento Encefálico/métodos , Disfunção Cognitiva/diagnóstico , Imageamento por Ressonância Magnética/métodos , Rede Nervosa/fisiopatologia , Idoso , Feminino , Humanos , Masculino
20.
Breast Cancer Res Treat ; 169(3): 625-632, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29429018

RESUMO

BACKGROUND: Breast cancer is one of the most frequently diagnosed cancers among women worldwide, characterized by diverse biological heterogeneity. It is well known that complex and combined gene regulation of multi-omics is involved in the occurrence and development of breast cancer. RESULTS: In this paper, we present the Multi-Omics Breast Cancer Database (MOBCdb), a simple and easily accessible repository that integrates genomic, transcriptomic, epigenomic, clinical, and drug response data of different subtypes of breast cancer. MOBCdb allows users to retrieve simple nucleotide variation (SNV), gene expression, microRNA expression, DNA methylation, and specific drug response data by various search fashions. The genome-wide browser /navigation facility in MOBCdb provides an interface for visualizing multi-omics data of multi-samples simultaneously. Furthermore, the survival module provides survival analysis for all or some of the samples by using data of three omics. The approved public drugs with genetic variations on breast cancer are also included in MOBCdb. CONCLUSION: In summary, MOBCdb provides users a unique web interface to the integrated multi-omics data of different subtypes of breast cancer, which enables the users to identify potential novel biomarkers for precision medicine.


Assuntos
Neoplasias da Mama/genética , Biologia Computacional/métodos , Bases de Dados Factuais , Genômica , Medicina de Precisão , Neoplasias da Mama/metabolismo , Descoberta de Drogas , Epigenômica/métodos , Feminino , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Humanos , Medicina de Precisão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA