Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Protocol for automated N-glycan sequencing using mass spectrometry and computer-assisted intelligent fragmentation.

Huang, Chuncui; Wang, Hui; Zhou, Jinyu; Huang, Yikang; Ren, Yihui; Zhao, Keli; Wang, Yaojun; Hou, Meijie; Zhang, Jingwei; Liu, Yaming; Ma, Xinyue; Yan, Jingyu; Bu, Dongbo; Chai, Wengang; Sun, Shiwei; Li, Yan.

STAR Protoc ; 5(2): 102976, 2024 Jun 21.

Artigo em Inglês | MEDLINE | ID: mdl-38635398

RESUMO

Biological functions of glycans are intimately linked to fine details in branches and linkages, which make structural identification extremely challenging. Here, we present a protocol for automated N-glycan sequencing using multi-stage mass spectrometry (MSn). We describe steps for release/purification and derivation of glycans and procedures for MSn scanning. We then detail "glycan intelligent precursor selection" to computationally guide MSn experiments. The protocol can be used for both discrete individual glycans and isomeric glycan mixtures. For complete details on the use and execution of this protocol, please refer to Sun et al.,1 Huang et al.,2 and Huang et al.3.

Assuntos

Espectrometria de Massas , Polissacarídeos , Polissacarídeos/análise , Polissacarídeos/química , Espectrometria de Massas/métodos , Análise de Sequência/métodos

2.

Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials.

Gong, Tiansu; Ju, Fusong; Bu, Dongbo.

Commun Biol ; 7(1): 297, 2024 Mar 09.

Artigo em Inglês | MEDLINE | ID: mdl-38461362

RESUMO

Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.

Assuntos

Algoritmos , RNA , RNA/genética , Conformação de Ácido Nucleico , Pareamento de Bases , Redes Neurais de Computação

3.

EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction.

Hou, Xiaoyang; Wang, Yu; Bu, Dongbo; Wang, Yaojun; Sun, Shiwei.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37930896

RESUMO

MOTIVATION: N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in various diseases. Therefore, identifying N-linked glycosylation sites is imperative for comprehending the mechanisms and systems underlying glycosylation. Due to the inherent experimental complexities, machine learning and deep learning have become indispensable tools for predicting these sites. RESULTS: In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set.

Assuntos

Processamento de Proteína Pós-Traducional , Proteínas , Glicosilação , Proteínas/química , Aprendizado de Máquina , Máquina de Vetores de Suporte , Biologia Computacional/métodos

4.

SASA-Net: A Spatial-Aware Self-Attention Mechanism for Building Protein 3D Structure Directly From Inter- Residue Distances.

Gong, Tiansu; Ju, Fusong; Sun, Shiwei; Bu, Dongbo.

IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3482-3488, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37022274

RESUMO

Protein functions are tightly related to the fine details of their 3D structures. To understand protein structures, computational prediction approaches are highly needed. Recently, protein structure prediction has achieved considerable progresses mainly due to the increased accuracy of inter-residue distance estimation and the application of deep learning techniques. Most of the distance-based ab initio prediction approaches adopt a two-step diagram: constructing a potential function based on the estimated inter-residue distances, and then build a 3D structure that minimizes the potential function. These approaches have proven very promising; however, they still suffer from several limitations, especially the inaccuracies incurred by the handcrafted potential function. Here, we present SASA-Net, a deep learning-based approach that directly learns protein 3D structure from the estimated inter-residue distances. Unlike the existing approach simply representing protein structures as coordinates of atoms, SASA-Net represents protein structures using pose of residues, i.e., the coordinate system of each individual residue in which all backbone atoms of this residue are fixed. The key element of SASA-Net is a spatial-aware self-attention mechanism, which is able to adjust a residue's pose according to all other residues' features and the estimated distances between residues. By iteratively applying the spatial-aware self-attention mechanism, SASA-Net continuously improves the structure and finally acquires a structure with high accuracy. Using the CATH35 proteins as representatives, we demonstrate that SASA-Net is able to accurately and efficiently build structures from the estimated inter-residue distances. The high accuracy and efficiency of SASA-Net enables an end-to-end neural network model for protein structure prediction through combining SASA-Net and an neural network for inter-residue distance prediction. Source code of SASA-Net is available at https://github.com/gongtiansu/SASA-Net/.

Assuntos

Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Proteínas/química , Redes Neurais de Computação , Software

5.

Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms.

Huang, Bin; Kong, Lupeng; Wang, Chao; Ju, Fusong; Zhang, Qi; Zhu, Jianwei; Gong, Tiansu; Zhang, Haicang; Yu, Chungong; Zheng, Wei-Mou; Bu, Dongbo.

Genomics Proteomics Bioinformatics ; 21(5): 913-925, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37001856

RESUMO

Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.

Assuntos

Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Redes Neurais de Computação , Dobramento de Proteína , Biologia Computacional/métodos

6.

Accurate and efficient protein sequence design through learning concise local environment of residues.

Huang, Bin; Fan, Tingwen; Wang, Kaiyue; Zhang, Haicang; Yu, Chungong; Nie, Shuyu; Qi, Yangshuo; Zheng, Wei-Mou; Han, Jian; Fan, Zheng; Sun, Shiwei; Ye, Sheng; Yang, Huaiyi; Bu, Dongbo.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36916746

RESUMO

MOTIVATION: Computational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired. RESULTS: Here, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein. AVAILABILITY AND IMPLEMENTATION: The source code of ProDESIGN-LE is available at https://github.com/bigict/ProDESIGN-LE.

Assuntos

Proteínas , Software , Sequência de Aminoácidos , Proteínas/química

7.

GIPS-Mix for Accurate Identification of Isomeric Components in Glycan Mixtures Using Intelligent Group-Opting Strategy.

Huang, Chuncui; Hou, Meijie; Yan, Jingyu; Wang, Hui; Wang, Yu; Cao, Cuiyan; Wang, Yaojun; Gao, Huanyu; Ma, Xinyue; Zheng, Yi; Bu, Dongbo; Chai, Wengang; Li, Yan; Sun, Shiwei.

Anal Chem ; 95(2): 811-819, 2023 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-36547394

RESUMO

Accurate identification of glycan structures is highly desirable as they are intimately linked to their different functions. However, glycan samples generally exist as mixtures with multiple isomeric structures, making assignment of individual glycan components very challenging, even with the aid of multistage mass spectrometry (MSn). Here, we present an approach, GIPS-mix, for assignment of isomeric glycans within a mixture using an intelligent group-opting strategy. Our approach enumerates all possible combinations (groupings) of candidate glycans and opts in the best-matched glycan group(s) based on the similarity between the simulated spectra of each glycan group and the acquired experimental spectra of the mixture. In the case that a single group could not be elected, a tie break is performed by additional MSn scanning using intelligently selected precursors. With 11 standard mixtures and 6 human milk oligosaccharide fractions, we demonstrate the application of GIPS-mix in assignment of individual glycans in mixtures with high accuracy and efficiency.

Assuntos

Oligossacarídeos , Polissacarídeos , Humanos , Polissacarídeos/química , Oligossacarídeos/análise , Isomerismo , Leite Humano/química

8.

Microbiome Resilience and Health Implications for People in Half-Year Travel.

Cheng, Mingyue; Liu, Hong; Han, Maozhen; Li, Shuai Cheng; Bu, Dongbo; Sun, Shiwei; Hu, Zhiqiang; Yang, Pengshuo; Wang, Rui; Liu, Yawen; Chen, Feng; Peng, Jianjun; Peng, Hong; Song, Hongxing; Xia, Yang; Chu, Liqun; Zhou, Quan; Guan, Feng; Wu, Jing; Tan, Guangming; Ning, Kang.

Front Immunol ; 13: 848994, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35281043

RESUMO

Travel entail change in geography and diet, both of which are known as determinant factors in shaping the human gut microbiome. Additionally, altered gut microbiome modulates immunity, bringing about health implications in humans. To explore the effects of the mid-term travel on the gut microbiome, we generated 16S rRNA gene and metagenomic sequencing data from longitudinal samples collected over six months. We monitored dynamic trajectories of the gut microbiome variation of a Chinese volunteer team (VT) in their whole journey to Trinidad and Tobago (TAT). We found gut microbiome resilience that VT's gut microbial compositions gradually transformed to the local TAT's enterotypes during their six-month stay in TAT, and then reverted to their original enterotypes after VT's return to Beijing in one month. Moreover, we identified driven species in this bi-directional plasticity that could play a role in immunity modulation, as exemplified by Bacteroides dorei that attenuated atherosclerotic lesion formation and effectively suppressed proinflammatory immune response. Another driven species P. copri could play a crucial role in rheumatoid arthritis pathogenesis, a chronic autoimmune disease. Carbohydrate-active enzymes are often implicated in immune and host-pathogen interactions, of which glycoside hydrolases were found decreased but glycosyltransferases and carbohydrate esterases increased during the travel; these functions were then restored after VT' returning to Beijing. Furthermore, we discovered these microbial changes and restoration were mediated by VT people's dietary changes. These findings indicate that half-year travel leads to change in enterotype and functional patterns, exerting effects on human health. Microbial intervention by dietary guidance in half-year travel would be conducive to immunity modulation for maintaining health.

Assuntos

Microbioma Gastrointestinal , Carboidratos , Fezes , Microbioma Gastrointestinal/genética , Humanos , Metagenômica , RNA Ribossômico 16S/genética

9.

ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs.

Kong, Lupeng; Ju, Fusong; Zheng, Wei-Mou; Zhu, Jianwei; Sun, Shiwei; Xu, Jinbo; Bu, Dongbo.

J Comput Biol ; 29(2): 92-105, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35073170

RESUMO

Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.

Assuntos

Aprendizado Profundo , Proteínas/química , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Motivos de Aminoácidos , Sequência de Aminoácidos , Biologia Computacional , Modelos Moleculares , Redes Neurais de Computação , Conformação Proteica , Proteínas/genética , Análise de Sequência de Proteína/estatística & dados numéricos , Software

10.

Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction.

Ju, Fusong; Zhu, Jianwei; Zhang, Qi; Wei, Guozheng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo.

Bioinformatics ; 38(4): 990-996, 2022 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-34849579

RESUMO

MOTIVATION: Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. RESULTS: Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. AVAILABILITY AND IMPLEMENTATION: The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteínas , Software , Alinhamento de Sequência , Proteínas/genética , Proteínas/química , Estrutura Secundária de Proteína , Matrizes de Pontuação de Posição Específica , Algoritmos

11.

The effect of N-glycosylation of SARS-CoV-2 spike protein on the virus interaction with the host cell ACE2 receptor.

Huang, Chuncui; Tan, Zeshun; Zhao, Keli; Zou, Wenjun; Wang, Hui; Gao, Huanyu; Sun, Shiwei; Bu, Dongbo; Chai, Wengang; Li, Yan.

iScience ; 24(11): 103272, 2021 Nov 19.

Artigo em Inglês | MEDLINE | ID: mdl-34661088

RESUMO

The densely glycosylated spike (S) protein highly exposed on severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) surface mediates host cell entry by binding to the receptor angiotensin-converting enzyme 2 (ACE2). However, the role of glycosylation has not been fully understood. In this study, we investigated the effect of different N-glycosylation of S1 protein on its binding to ACE2. Using real-time surface plasmon resonance assay the negative effects were demonstrated by the considerable increase of binding affinities of de-N-glycosylated S1 proteins produced from three different expression systems including baculovirus-insect, Chinese hamster ovarian and two variants of human embryonic kidney 293 cells. Molecular dynamic simulations of the S1 protein-ACE2 receptor complex revealed the steric hindrance and Coulombic repulsion effects of different types of N-glycans on the S1 protein interaction with ACE2. The results should contribute to future pathological studies of SARS-CoV-2 and therapeutic development of Covid-19, particularly using recombinant S1 proteins as models.

12.

Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph.

Huang, Bin; Wei, Guozheng; Wang, Bing; Ju, Fusong; Zhong, Yi; Shi, Zhuozheng; Sun, Shiwei; Bu, Dongbo.

BMC Bioinformatics ; 22(1): 533, 2021 Oct 30.

Artigo em Inglês | MEDLINE | ID: mdl-34717539

RESUMO

BACKGROUND: Optical maps record locations of specific enzyme recognition sites within long genome fragments. This long-distance information enables aligning genome assembly contigs onto optical maps and ordering contigs into scaffolds. The generated scaffolds, however, often contain a large amount of gaps. To fill these gaps, a feasible way is to search genome assembly graph for the best-matching contig paths that connect boundary contigs of gaps. The combination of searching and evaluation procedures might be "searching followed by evaluation", which is infeasible for long gaps, or "searching by evaluation", which heavily relies on heuristics and thus usually yields unreliable contig paths. RESULTS: We here report an accurate and efficient approach to filling gaps of genome scaffolds with aids of optical maps. Using simulated data from 12 species and real data from 3 species, we demonstrate the successful application of our approach in gap filling with improved accuracy and completeness of genome scaffolds. CONCLUSION: Our approach applies a sequential Bayesian updating technique to measure the similarity between optical maps and candidate contig paths. Using this similarity to guide path searching, our approach achieves higher accuracy than the existing "searching by evaluation" strategy that relies on heuristics. Furthermore, unlike the "searching followed by evaluation" strategy enumerating all possible paths, our approach prunes the unlikely sub-paths and extends the highly-probable ones only, thus significantly increasing searching efficiency.

Assuntos

Algoritmos , Genoma , Teorema de Bayes , Mapeamento de Sequências Contíguas , Mapeamento por Restrição , Análise de Sequência de DNA

13.

HepParser: An Intelligent Software Program for Deciphering Low-Molecular-Weight Heparin Based on Mass Spectrometry.

Wang, Hui; Wang, Yu; Hou, Meijie; Zhang, Chunming; Wang, Yaojun; Guo, Zhendong; Bu, Dongbo; Li, Yan; Huang, Chuncui; Sun, Shiwei.

Front Chem ; 9: 723149, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34568278

RESUMO

Low-molecular-weight heparins (LMWHs) are considered to be the most successful carbohydrate-based drugs because of their wide use as anticoagulants in clinics. The efficacy of anticoagulants made by LMWHs mainly depends on the components and structures of LMWHs. Therefore, deciphering the components and identifying the structures of LMWHs are critical to developing high-efficiency anticoagulants. However, most LMWHs are mixtures of linear polysaccharides which are comprised of several disaccharide repeating units with high similarity, making it extremely challenging to separate and decipher each component in LMWHs. Here, we present a new algorithm named hepParser to decipher the main components of LMWHs automatically and precisely based on the liquid chromatography/mass spectrometry (LC/MS) data. When tested on the general LMWH using hepParser, profiling of the oligosaccharides with different degrees of polymerization (dp's) was completed with high accuracy within 1 minute. When compared with the results of GlycReSoft on heparan sulfate samples, hepParser achieved more comprehensive and reasonable results automatically.

14.

FALCON2: a web server for high-quality prediction of protein tertiary structures.

Kong, Lupeng; Ju, Fusong; Zhang, Haicang; Sun, Shiwei; Bu, Dongbo.

BMC Bioinformatics ; 22(1): 439, 2021 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-34525939

RESUMO

BACKGROUND: Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. RESULTS: In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. CONCLUSIONS: By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.

Assuntos

Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Computadores , Conformação Proteica , Software

15.

FINER: enhancing the prediction of tissue-specific functions of isoforms by refining isoform interaction networks.

Chen, Hao; Shaw, Dipan; Bu, Dongbo; Jiang, Tao.

NAR Genom Bioinform ; 3(2): lqab057, 2021 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-34169280

RESUMO

Annotating the functions of gene products is a mainstay in biology. A variety of databases have been established to record functional knowledge at the gene level. However, functional annotations at the isoform resolution are in great demand in many biological applications. Although critical information in biological processes such as protein-protein interactions (PPIs) is often used to study gene functions, it does not directly help differentiate the functions of isoforms, as the 'proteins' in the existing PPIs generally refer to 'genes'. On the other hand, the prediction of isoform functions and prediction of isoform-isoform interactions, though inherently intertwined, have so far been treated as independent computational problems in the literature. Here, we present FINER, a unified framework to jointly predict isoform functions and refine PPIs from the gene level to the isoform level, enabling both tasks to benefit from each other. Extensive computational experiments on human tissue-specific data demonstrate that FINER is able to gain at least 5.16% in AUC and 15.1% in AUPRC for functional prediction across multiple tissues by refining noisy PPIs, resulting in significant improvement over the state-of-the-art methods. Some in-depth analyses reveal consistency between FINER's predictions and the tissue specificity as well as subcellular localization of isoforms.

16.

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction.

Ju, Fusong; Zhu, Jianwei; Shao, Bin; Kong, Lupeng; Liu, Tie-Yan; Zheng, Wei-Mou; Bu, Dongbo.

Nat Commun ; 12(1): 2535, 2021 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-33953201

RESUMO

Residue co-evolution has become the primary principle for estimating inter-residue distances of a protein, which are crucially important for predicting protein structure. Most existing approaches adopt an indirect strategy, i.e., inferring residue co-evolution based on some hand-crafted features, say, a covariance matrix, calculated from multiple sequence alignment (MSA) of target protein. This indirect strategy, however, cannot fully exploit the information carried by MSA. Here, we report an end-to-end deep neural network, CopulaNet, to estimate residue co-evolution directly from MSA. The key elements of CopulaNet include: (i) an encoder to model context-specific mutation for each residue; (ii) an aggregator to model residue co-evolution, and thereafter estimate inter-residue distances. Using CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrate that CopulaNet can predict protein structure with improved accuracy and efficiency. This study represents a step toward improved end-to-end prediction of inter-residue distances and protein tertiary structures.

Assuntos

Aprendizado de Máquina , Proteínas/química , Alinhamento de Sequência , Caspases/química , Biologia Computacional , Humanos , Modelos Moleculares , Mutação , Redes Neurais de Computação , Estrutura Terciária de Proteína , Proteínas/genética

17.

ISSEC: inferring contacts among protein secondary structure elements using deep object detection.

Zhang, Qi; Zhu, Jianwei; Ju, Fusong; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo.

BMC Bioinformatics ; 21(1): 503, 2020 Nov 05.

Artigo em Inglês | MEDLINE | ID: mdl-33153432

RESUMO

BACKGROUND: The formation of contacts among protein secondary structure elements (SSEs) is an important step in protein folding as it determines topology of protein tertiary structure; hence, inferring inter-SSE contacts is crucial to protein structure prediction. One of the existing strategies infers inter-SSE contacts directly from the predicted possibilities of inter-residue contacts without any preprocessing, and thus suffers from the excessive noises existing in the predicted inter-residue contacts. Another strategy defines SSEs based on protein secondary structure prediction first, and then judges whether each candidate SSE pair could form contact or not. However, it is difficult to accurately determine boundary of SSEs due to the errors in secondary structure prediction. The incorrectly-deduced SSEs definitely hinder subsequent prediction of the contacts among them. RESULTS: We here report an accurate approach to infer the inter-SSE contacts (thus called as ISSEC) using the deep object detection technique. The design of ISSEC is based on the observation that, in the inter-residue contact map, the contacting SSEs usually form rectangle regions with characteristic patterns. Therefore, ISSEC infers inter-SSE contacts through detecting such rectangle regions. Unlike the existing approach directly using the predicted probabilities of inter-residue contact, ISSEC applies the deep convolution technique to extract high-level features from the inter-residue contacts. More importantly, ISSEC does not rely on the pre-defined SSEs. Instead, ISSEC enumerates multiple candidate rectangle regions in the predicted inter-residue contact map, and for each region, ISSEC calculates a confidence score to measure whether it has characteristic patterns or not. ISSEC employs greedy strategy to select non-overlapping regions with high confidence score, and finally infers inter-SSE contacts according to these regions. CONCLUSIONS: Comprehensive experimental results suggested that ISSEC outperformed the state-of-the-art approaches in predicting inter-SSE contacts. We further demonstrated the successful applications of ISSEC to improve prediction of both inter-residue contacts and tertiary structure as well.

Assuntos

Algoritmos , Proteínas/química , Bases de Dados de Proteínas , Proteínas de Membrana/química , Conformação Proteica em Folha beta , Estrutura Secundária de Proteína

18.

Multistage mass spectrometry with intelligent precursor selection for N-glycan branching pattern analysis.

Huang, Chuncui; Wang, Hui; Bu, Dongbo; Zhou, Jinyu; Dong, Junchuan; Zhang, Jingwei; Gao, Huanyu; Wang, Yaojun; Chai, Wengang; Sun, Shiwei; Li, Yan.

Carbohydr Polym ; 237: 116122, 2020 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-32241449

RESUMO

Biological functions of N-glycans are frequently related to their unique branching patterns. Multistage mass spectrometry (MSn) has become the primary method for glycan structural analysis. However, selection of the best fragment as the precursor for the next round of product-ion scanning is important but difficult. We have previously proposed the concept and designed the approach of glycan intelligent precursor selection (GIPS) to guide MSn experiments, but its use in N-glycans is not straightforward as some N-glycans are of high similarity in branching patterns. In the present work we introduced new elements to GIPS to improve its performance in N-glycan branching pattern analysis. These include a hypothesis and significance test, based on Bayes factor, and DPbiased as a new precursor selection strategy. The improved GIPS was successfully applied to identification of individual N-glycans, and incorporated into MALDI-MS N-glycan profiling for assignment of N-glycans obtained from glycoproteins and complex human serum.

Assuntos

Glicoproteínas/química , Polissacarídeos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Glicoproteínas/sangue , Humanos , Estrutura Molecular , Polissacarídeos/química , Polissacarídeos/classificação

19.

Identification of glycan branching patterns using multistage mass spectrometry with spectra tree analysis.

Wang, Hui; Zhang, Jingwei; Dong, Junchuan; Hou, Meijie; Pan, Weiyi; Bu, Dongbo; Zhou, Jinyu; Zhang, Qi; Wang, Yaojun; Zhao, Keli; Li, Yan; Huang, Chuncui; Sun, Shiwei.

J Proteomics ; 217: 103649, 2020 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-31978548

RESUMO

Glycans are crucial to a wide range of biological processes, and their biological activities are closely related to the branching patterns of structures. Different from the simple linear chains of proteins, branching patterns of glycans are more complicated, making their identification extremely challenging. Tandem mass spectrometry (MS2) cannot provide sufficient structural information to deduce glycan branching patterns even with the assistance of various bioinformatic tools and algorithms.The promising technology to identify glycan branching patterns is multi-stage mass spectrometry (MSn). The production-relationship among MSn spectra of a glycan is essentially a tree, making deducing glycan structures from MSn spectra a great challenge. In the present study, we report an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to fully exploit the information contained in the MSn spectra tree for glycan identification. Using 14 glycan standards, including 2 pairs with isomeric sequence, and 16 complex N-glycans isolated from RNase B and IgG, we demonstrated the successful application of glyBranch to branching pattern analysis. The source code of glyBranch is available at https://github.com/bigict/glyBranch/. We have also developed a web-server, which is freely accessible at http://glycan.ict.ac.cn/glyBranch/. SIGNIFICANCE: Glycans are crucial in various biological processes and their functions are closely related to the details of their structures; thus, the identification of glycan branching patterns is of great significance to biological studies. Multistage mass spectrometry (MSn) can provide detailed structural information by generating multiple-level fragments through consecutive fragmentation; however, the interpretation of numerous MSn spectra is extremely challenging. In this study, we present an approach called glyBranch (glycan Branching pattern identification based on spectra tree) to exploit the information contained in MSn spectra tree for glycan identification. This approach will greatly facilitate the automated identification of glycan structures and related biological studies.

Assuntos

Polissacarídeos , Espectrometria de Massas em Tandem , Algoritmos , Software

20.

Correction to: Predicting protein inter-residue contacts using composite likelihood maximization and deep learning.

Zhang, Haicang; Zhang, Qi; Ju, Fusong; Zhu, Jianwei; Gao, Yujuan; Xie, Ziwei; Deng, Minghua; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo.

BMC Bioinformatics ; 20(1): 616, 2019 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-31783729

RESUMO

Following publication of the original article [1], the author explained that there are several errors in the original article.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA