Pesquisa | Portal de Pesquisa da BVS

1.

FREE: Enhanced Feature Representation for Isotopic Envelope Evaluation in Top-Down Mass Spectra Deconvolution.

Zhong, Jiancheng; Song, Xingran; Wang, Shaokai.

Anal Chem ; 96(31): 12602-12615, 2024 08 06.

Artigo em Inglês | MEDLINE | ID: mdl-39037184

RESUMO

The aim of deconvolution of top-down mass spectra is to recognize monoisotopic peaks from the experimental envelopes in raw mass spectra. So accurate assessment of similarity between theoretical and experimental envelopes is a critical step in mass spectra data deconvolution. Existing evaluation methods primarily rely on intensity differences and m/z similarity, potentially lacking a comprehensive assessment. To overcome this constraint and facilitate a comprehensive and refined assessment of the similarity between theoretical and experimental envelopes, there exists an imperative to systematically explore and identify increasingly efficacious features for assessing this correspondence. We present enhanced feature representation for isotopic envelope evaluation (FREE) that derives diverse feature representations, encapsulating fundamental physical attributes of envelopes, including peak intensity and envelope shape. We trained FREE and evaluated its performance on both the ovarian tumor (OT) (human OT cells) data set and zebrafish (ZF) (brain in mature female ZF) data set. Specifically, comparing the state-of-art method, FREE demonstrates higher performance in multiple evaluation metrics across both the OT and ZF data sets, with a particular emphasis on precision, and it demonstrates accurate predictions of a greater number of positive envelopes among the top-ranked envelopes based on their scores. Moreover, within a cross-species data set of ZF, FREE identified a higher number of proteoform-spectrum matches (PrSMs), increasing the count from 50,795 to 52,927 compared to EnvCNN, the amalgamation of FREE with TopFD also exhibits a commendable capacity to discern 117,883 fragment ions, thus surpassing the 97,554 fragment ions identified through the application of EnvCNN in conjunction with TopFD. To further validate the performance of FREE, we have tested 10 a cross-species top-down proteomes containing 36 subdata set from ProteomeXchange. The results reveal that, after deconvolution with TopFD + FREE, TopPIC identifies more PrSMs across these 10 data sets in both the first and second rounds of experiments. These findings underscore the robustness and generalization capabilities of the FREE approach in diverse proteomes.

Assuntos

Espectrometria de Massas , Peixe-Zebra , Animais , Humanos , Espectrometria de Massas/métodos , Feminino , Neoplasias Ovarianas/patologia , Isótopos/análise

2.

Group-IIIA element doped BaSnS₂ as a high efficiency absorber for intermediate band solar cell from a first-principles insight.

Xue, Yang; Lin, Changqing; Zhong, Jiancheng; Huang, Dan; Persson, Clas.

Phys Chem Chem Phys ; 26(10): 8380-8389, 2024 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-38404232

RESUMO

The quest for high-performance solar cell absorbers has garnered significant attention in the field of photovoltaic research in recent years. To overcome the Shockley-Queisser (SQ) limit of â¼31% for single junction solar cell and realize higher power conversion efficiency, the concept of an intermediate band solar cell (IBSC) has been proposed. This involves the incorporation of an intermediate band (IB) to assist the three band-edge absorptions within the single absorber layer. BaSnS2 has an appropriate width of its forbidden gap in order to host an IB. In this work, doping of BaSnS2 was studied based on hybrid functional calculations. The results demonstrated that isolated and half-filled IBs were generated with suitable energy states in the band gap region after group-IIIA element (i.e., Al, Ga, and In) doping at Sn site. The theoretical efficiencies under one sun illumination of 39.0%, 44.3%, and 39.7% were obtained for 25% doping concentration of Al, Ga, and In, respectively; thus, larger than the single-junction SQ-limit. Furthermore, the dopants have lower formation energies when substituting the Sn site compare to occupying the Ba and S sites, and that helps realizing a proper IB with three band-edge absorptions. Therefore, group-IIIA element doped BaSnS2 is proposed as a high-efficiency absorber for IBSC.

3.

Proteoform characterization based on top-down mass spectrometry.

Zhong, Jiancheng; Sun, Yusui; Xie, Minzhu; Peng, Wei; Zhang, Chushu; Wu, Fang-Xiang; Wang, Jianxin.

Brief Bioinform ; 22(2): 1729-1750, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-32118252

RESUMO

Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.

Assuntos

Espectrometria de Massas/métodos , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas

4.

Diagnosis of Autism Spectrum Disorder (ASD) Using Recursive Feature Elimination-Graph Neural Network (RFE-GNN) and Phenotypic Feature Extractor (PFE).

Yang, Jiahong; Hu, Miaojun; Hu, Yao; Zhang, Zixi; Zhong, Jiancheng.

Sensors (Basel) ; 23(24)2023 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-38139493

RESUMO

Autism spectrum disorder (ASD) poses as a multifaceted neurodevelopmental condition, significantly impacting children's social, behavioral, and communicative capacities. Despite extensive research, the precise etiological origins of ASD remain elusive, with observable connections to brain activity. In this study, we propose a novel framework for ASD detection, extracting the characteristics of functional magnetic resonance imaging (fMRI) data and phenotypic data, respectively. Specifically, we employ recursive feature elimination (RFE) for feature selection of fMRI data and subsequently apply graph neural networks (GNN) to extract informative features from the chosen data. Moreover, we devise a phenotypic feature extractor (PFE) to extract phenotypic features effectively. We then, synergistically fuse the features and validate them on the ABIDE dataset, achieving 78.7% and 80.6% accuracy, respectively, thereby showcasing competitive performance compared to state-of-the-art methods. The proposed framework provides a promising direction for the development of effective diagnostic tools for ASD.

Assuntos

Transtorno do Espectro Autista , Criança , Humanos , Transtorno do Espectro Autista/diagnóstico por imagem , Comunicação , Redes Neurais de Computação , Encéfalo/diagnóstico por imagem , Imageamento por Ressonância Magnética , Mapeamento Encefálico

5.

A novel essential protein identification method based on PPI networks and gene expression data.

Zhong, Jiancheng; Tang, Chao; Peng, Wei; Xie, Minzhu; Sun, Yusui; Tang, Qiang; Xiao, Qiu; Yang, Jiahong.

BMC Bioinformatics ; 22(1): 248, 2021 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-33985429

RESUMO

BACKGROUND: Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. RESULTS: In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. CONCLUSIONS: We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.

Assuntos

Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae , Algoritmos , Biologia Computacional , Mapeamento de Interação de Proteínas , Curva ROC , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcriptoma

6.

iCDA-CMG: identifying circRNA-disease associations by federating multi-similarity fusion and collective matrix completion.

Xiao, Qiu; Zhong, Jiancheng; Tang, Xiwei; Luo, Jiawei.

Mol Genet Genomics ; 296(1): 223-233, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33159254

RESUMO

Circular RNAs (circRNAs) are a special class of non-coding RNAs with covalently closed-loop structures. Studies prove that circRNAs perform critical roles in various biological processes, and the aberrant expression of circRNAs is closely related to tumorigenesis. Therefore, identifying potential circRNA-disease associations is beneficial to understand the pathogenesis of complex diseases at the circRNA level and helps biomedical researchers and practitioners to discover diagnostic biomarkers accurately. However, it is tremendously laborious and time-consuming to discover disease-related circRNAs with conventional biological experiments. In this study, we develop an integrative framework, called iCDA-CMG, to predict potential associations between circRNAs and diseases. By incorporating multi-source prior knowledge, including known circRNA-disease associations, disease similarities and circRNA similarities, we adopt a collective matrix completion-based graph learning model to prioritize the most promising disease-related circRNAs for guiding laborious clinical trials. The results show that iCDA-CMG outperforms other state-of-the-art models in terms of cross-validation and independent prediction. Moreover, the case studies for several representative cancers suggest the effectiveness of iCDA-CMG in screening circRNA candidates for human diseases, which will contribute to elucidating the pathogenesis mechanisms and unveiling new opportunities for disease diagnosis and targeted therapy.

Assuntos

Algoritmos , Modelos Estatísticos , Neoplasias/genética , RNA Circular/genética , RNA Neoplásico/genética , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Humanos , Modelos Genéticos , Neoplasias/classificação , Neoplasias/diagnóstico , Neoplasias/patologia , RNA Circular/metabolismo , RNA Neoplásico/metabolismo , Projetos de Pesquisa

7.

An in-silico method with graph-based multi-label learning for large-scale prediction of circRNA-disease associations.

Xiao, Qiu; Yu, Haiming; Zhong, Jiancheng; Liang, Cheng; Li, Guanghui; Ding, Pingjian; Luo, Jiawei.

Genomics ; 112(5): 3407-3415, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32561349

RESUMO

Circular RNAs (circRNAs) have been proved to be implicated in various pathological processes and play vital roles in tumors. Increasing evidence has shown that circRNAs can serve as an important class of regulators, which have great potential to become a new type of biomarkers for tumor diagnosis and treatment. However, their biological functions remain largely unknown, and it is costly and tremendously laborious to investigate the molecular mechanisms of circRNAs in human diseases based on conventional wet-lab experiments. The emergence and rapid growth of genomics data sources has provided new opportunities for us to decipher the underlying relationships between circRNAs and diseases by computational models. Therefore, it is appealing to develop powerful computational models to discover potential disease-associated circRNAs. Here, we develop an in-silico method with graph-based multi-label learning for large-scale of prediction potential circRNA-disease associations and discovery of those most promising disease circRNAs. By fully exploiting different characteristics of circRNA space and disease space and maintaining the data local geometric structures, the graph regularization and mixed-norm constraint terms are also incorporated into the model to help to make prediction. Results and case studies show that the proposed method outperforms other models and could effectively infer potential associations with high accuracy.

Assuntos

Simulação por Computador , Doença/genética , RNA Circular , Algoritmos , Animais , Biologia Computacional/métodos , Humanos , Camundongos , Ratos

8.

A novel method of predicting microRNA-disease associations based on microRNA, disease, gene and environment factor networks.

Peng, Wei; Lan, Wei; Zhong, Jiancheng; Wang, Jianxin; Pan, Yi.

Methods ; 124: 69-77, 2017 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-28576328

RESUMO

MicroRNAs have been reported to have close relationship with diseases due to their deregulation of the expression of target mRNAs. Detecting disease-related microRNAs is helpful for disease therapies. With the development of high throughput experimental techniques, a large number of microRNAs have been sequenced. However, it is still a big challenge to identify which microRNAs are related to diseases. Recently, researchers are interesting in combining multiple-biological information to identify the associations between microRNAs and diseases. In this work, we have proposed a novel method to predict the microRNA-disease associations based on four biological properties. They are microRNA, disease, gene and environment factor. Compared with previous methods, our method makes predictions not only by using the prior knowledge of associations among microRNAs, disease, environment factors and genes, but also by using the internal relationship among these biological properties. We constructed four biological networks based on the similarity of microRNAs, diseases, environment factors and genes, respectively. Then random walking was implemented on the four networks unequally. In the walking course, the associations can be inferred from the neighbors in the same networks. Meanwhile the association information can be transferred from one network to another. The results of experiment showed that our method achieved better prediction performance than other existing state-of-the-art methods.

Assuntos

Algoritmos , Doenças Cardiovasculares/genética , Redes Reguladoras de Genes , MicroRNAs/genética , Neoplasias/genética , RNA Mensageiro/genética , Esquizofrenia/genética , Área Sob a Curva , Doenças Cardiovasculares/metabolismo , Doenças Cardiovasculares/patologia , Bases de Dados Genéticas , Regulação da Expressão Gênica , Interação Gene-Ambiente , Humanos , MicroRNAs/metabolismo , Neoplasias/metabolismo , Neoplasias/patologia , RNA Mensageiro/metabolismo , Fatores de Risco , Esquizofrenia/metabolismo , Esquizofrenia/patologia , Transdução de Sinais

9.

Sprites: detection of deletions from sequencing data by re-aligning split reads.

Zhang, Zhen; Wang, Jianxin; Luo, Junwei; Ding, Xiaojun; Zhong, Jiancheng; Wang, Jun; Wu, Fang-Xiang; Pan, Yi.

Bioinformatics ; 32(12): 1788-96, 2016 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-26833342

RESUMO

MOTIVATION: Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion. RESULTS: We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score. AVAILABILITY AND IMPLEMENTATION: Sprites is open source software and freely available at https://github.com/zhangzhen/sprites CONTACT: jxwang@mail.csu.edu.cnSupplementary data: Supplementary data are available at Bioinformatics online.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Humanos , Deleção de Sequência , Software

10.

Prediction of essential proteins based on gene expression programming.

Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi.

BMC Genomics ; 14 Suppl 4: S7, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24267033

RESUMO

BACKGROUND: Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. RESULTS: In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. CONCLUSIONS: The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.

Assuntos

Inteligência Artificial , Genes Essenciais , Proteínas/metabolismo , Software , Algoritmos , Sobrevivência Celular/genética , Biologia Computacional/métodos , Expressão Gênica , Modelos Genéticos , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/metabolismo

11.

DAHNGC: A Graph Convolution Model for Drug-Disease Association Prediction by Using Heterogeneous Network.

Zhong, Jiancheng; Cui, Pan; Zhu, Yihong; Xiao, Qiu; Qu, Zuohang.

J Comput Biol ; 30(9): 1019-1033, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-37702623

RESUMO

In the field of drug development and repositioning, the prediction of drug-disease associations is a critical task. A recently proposed method for predicting drug-disease associations based on graph convolution relies heavily on the features of adjacent nodes within the homogeneous network for characterizing information. However, this method lacks node attribute information from heterogeneous networks, which could hardly provide valuable insights for predicting drug-disease associations. In this study, a novel drug-disease association prediction model called DAHNGC is proposed, which is based on a graph convolutional neural network. This model includes two feature extraction methods that are specifically designed to extract the attribute characteristics of drugs and diseases from both homogeneous and heterogeneous networks. First, the DropEdge technique is added to the graph convolutional neural network to alleviate the oversmoothing problem and obtain the characteristics of the same nodes of drugs or diseases in the homogeneous network. Then, an automatic feature extraction method in the heterogeneous network is designed to obtain the features of drugs or diseases at different nodes. Finally, the obtained features are put into the fully connected network for nonlinear transformation, and the potential drug-disease pairs are obtained by bilinear decoding. Experimental results demonstrate that the DAHNGC model exhibits good predictive performance for drug-disease associations.

Assuntos

Desenvolvimento de Medicamentos , Redes Neurais de Computação

12.

DNRLCNN: A CNN Framework for Identifying MiRNA-Disease Associations Using Latent Feature Matrix Extraction with Positive Samples.

Zhong, Jiancheng; Zhou, Wubin; Kang, Jiedong; Fang, Zhuo; Xie, Minzhu; Xiao, Qiu; Peng, Wei.

Interdiscip Sci ; 14(2): 607-622, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35428965

RESUMO

Emerging evidence indicates that miRNAs have strong relationships with many human diseases. Investigating the associations will contribute to elucidating the activities of miRNAs and pathogenesis mechanisms, and providing new opportunities for disease diagnosis and drug discovery. Therefore, it is of significance to identify potential associations between miRNAs and diseases. The existing databases about the miRNA-disease associations (MDAs) only provide the known MDAs, which can be regarded as positive samples. However, the unknown MDAs are not sufficient to regard as reliable negative samples. To deal with this uncertainty, we proposed a convolutional neural network (CNN) framework, named DNRLCNN, based on a latent feature matrix extracted by only positive samples to predict MDAs. First, by only considering the positive samples into the calculation process, we captured the latent feature matrix for complex interactions between miRNAs and diseases in low-dimensional space. Then, we constructed a feature vector for each miRNA and disease pair based on the feature representation. Finally, we adopted a modified CNN for the feature vector to predict MDAs. As a result, our model achieves better performance than other state-of-the-art methods which based CNN in fivefold cross-validation on both miRNA-disease association prediction task (average AUC of 0.9030) and miRNA-phenotype association prediction task (average AUC of 0. 9442). In addition, we carried out case studies on two human diseases, and all the top-50 predicted miRNAs for lung neoplasms are confirmed by HMDD v3.2 and dbDEMC 2.0 databases, 98% of the top-50 predicted miRNAs for heart failure are confirmed. The experiment results show that our model has the capability of inferring potential disease-related miRNAs.

Assuntos

MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Redes Neurais de Computação

13.

A Novel Multi-Ensemble Method for Identifying Essential Proteins.

Dai, Wei; Chen, Bingxi; Peng, Wei; Li, Xia; Zhong, Jiancheng; Wang, Jianxin.

J Comput Biol ; 28(7): 637-649, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33439753

RESUMO

Essential proteins possess critical functions for cell survival. Identifying essential proteins improves our understanding of how a cell works and also plays a vital role in the research fields of disease treatment and drug development. Recently, some machine-learning methods and ensemble learning methods have been proposed to identify essential proteins by introducing effective protein features. However, the ensemble learning method only used to focus on the choice of base classifiers. In this article, we propose a novel ensemble learning framework called multi-ensemble to integrate different base classifiers. The multi-ensemble method adopts the idea of multi-view learning and selects multiple base classifiers and trains those classifiers by continually adding the samples that are predicted correctly by the other base classifiers. We applied multi-ensemble to Yeast data and Escherichia coli data. The results show that our approach achieved better performance than both individual classifiers and the other ensemble learning methods.

Assuntos

Biologia Computacional/métodos , Escherichia coli/metabolismo , Proteínas/análise , Leveduras/metabolismo , Algoritmos , Proteínas de Escherichia coli/metabolismo , Proteínas Fúngicas/metabolismo , Genes Essenciais , Aprendizado de Máquina

14.

Network Embedding the Protein-Protein Interaction Network for Human Essential Genes Identification.

Dai, Wei; Chang, Qi; Peng, Wei; Zhong, Jiancheng; Li, Yongjiang.

Genes (Basel) ; 11(2)2020 01 31.

Artigo em Inglês | MEDLINE | ID: mdl-32023848

RESUMO

Essential genes are a group of genes that are indispensable for cell survival and cell fertility. Studying human essential genes helps scientists reveal the underlying biological mechanisms of a human cell but also guides disease treatment. Recently, the publication of human essential gene data makes it possible for researchers to train a machine-learning classifier by using some features of the known human essential genes and to use the classifier to predict new human essential genes. Previous studies have found that the essentiality of genes closely relates to their properties in the protein-protein interaction (PPI) network. In this work, we propose a novel supervised method to predict human essential genes by network embedding the PPI network. Our approach implements a bias random walk on the network to get the node network context. Then, the node pairs are input into an artificial neural network to learn their representation vectors that maximally preserves network structure and the properties of the nodes in the network. Finally, the features are put into an SVM classifier to predict human essential genes. The prediction results on two human PPI networks show that our method achieves better performance than those that refer to either genes' sequence information or genes' centrality properties in the network as input features. Moreover, it also outperforms the methods that represent the PPI network by other previous approaches.

Assuntos

Biologia Computacional/métodos , Genes Essenciais , Mapas de Interação de Proteínas , Bases de Dados Genéticas , Humanos , Aprendizado de Máquina Supervisionado

15.

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction.

Zhong, Jiancheng; Sun, Yusui; Peng, Wei; Xie, Minzhu; Yang, Jiahong; Tang, Xiwei.

IEEE Trans Nanobioscience ; 17(3): 243-250, 2018 07.

Artigo em Inglês | MEDLINE | ID: mdl-29993553

RESUMO

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas , Algoritmos , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/fisiologia , Software

16.

Protein Inference from the Integration of Tandem MS Data and Interactome Networks.

Zhong, Jiancheng; Wang, Jianxing; Ding, Xiaojun; Zhang, Zhen; Li, Min; Wu, Fang-Xiang; Pan, Yi.

IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1399-1409, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28113634

RESUMO

Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Espectrometria de Massas em Tandem/métodos , Algoritmos , Área Sob a Curva , Bases de Dados de Proteínas , Humanos , Peptídeos/análise , Peptídeos/química , Proteínas/análise , Leveduras/genética , Leveduras/metabolismo

17.

ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi.

IEEE/ACM Trans Comput Biol Bioinform ; 12(4): 815-22, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26357321

RESUMO

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Software , Algoritmos , Interface Usuário-Computador

18.

Predicting Essential Proteins Based on Weighted Degree Centrality.

Tang, Xiwei; Wang, Jianxin; Zhong, Jiancheng; Pan, Yi.

IEEE/ACM Trans Comput Biol Bioinform ; 11(2): 407-18, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-26355787

RESUMO

Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency. In this work, Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and edge clustering coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI and e-Coli networks in order to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analyses are shown in the paper. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The analyses prove that WDC outperforms other methods including DC, BC, CC, SC, EC, IC, NC, and PeC. At the same time, the analyses also mean that it is an effective way to predict essential proteins by means of integrating different data sources.

Assuntos

Biologia Computacional/métodos , Mapas de Interação de Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Transcriptoma/genética , Análise por Conglomerados , Proteínas/genética , Curva ROC

19.

Predicting protein functions by using unbalanced bi-random walk algorithm on protein-protein interaction network and functional interrelationship network.

Peng, Wei; Wang, Jianxin; Chen, Lu; Zhong, Jiancheng; Zhang, Zhen; Pan, Yi.

Curr Protein Pept Sci ; 15(6): 529-39, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25059324

RESUMO

Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Recently, some methods have been developed to solve the problem by incorporating functional similarity of GO terms into protein-protein interaction (PPI) network, which are based on the observation that a protein tends to share some common functions with proteins that interact with it in PPI network, and two similar GO terms in functional interrelationship network usually co-annotate some common proteins. However, these methods annotate functions of proteins by considering at the same level neighbors of proteins and GO terms respectively, and few attempts have been made to investigate their difference. Given the topological and structural difference between PPI network and functional interrelationship network, we firstly investigate at which level neighbors of proteins tend to have functional associations and at which level neighbors of GO terms usually co-annotate some common proteins. Then, an unbalanced Bi-random walk (UBiRW) algorithm which iteratively walks different number of steps in the two networks is adopted to find protein-GO term associations according to some known associations. Experiments are carried out on S. cerevisiae data. The results show that our method achieves better prediction performance not only than methods that only use PPI network data, but also than methods that consider at the same level neighbors of proteins and of GO terms.

Assuntos

Algoritmos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA