Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36920090

RESUMO

AlphaFold2 achieved a breakthrough in protein structure prediction through the end-to-end deep learning method, which can predict nearly all single-domain proteins at experimental resolution. However, the prediction accuracy of full-chain proteins is generally lower than that of single-domain proteins because of the incorrect interactions between domains. In this work, we develop an inter-domain distance prediction method, named DeepIDDP. In DeepIDDP, we design a neural network with attention mechanisms, where two new inter-domain features are used to enhance the ability to capture the interactions between domains. Furthermore, we propose a data enhancement strategy termed DPMSA, which is employed to deal with the absence of co-evolutionary information on targets. We integrate DeepIDDP into our previously developed domain assembly method SADA, termed SADA-DeepIDDP. Tested on a given multi-domain benchmark dataset, the accuracy of SADA-DeepIDDP inter-domain distance prediction is 11.3% and 21.6% higher than trRosettaX and trRosetta, respectively. The accuracy of the domain assembly model is 2.5% higher than that of SADA. Meanwhile, we reassemble 68 human multi-domain protein models with TM-score ≤ 0.80 from the AlphaFold protein structure database, where the average TM-score is improved by 11.8% after the reassembly by our method. The online server is at http://zhanglab-bioinf.com/DeepIDDP/.


Assuntos
Algoritmos , Aprendizado Profundo , Humanos , Redes Neurais de Computação , Proteínas/química , Bases de Dados de Proteínas , Biologia Computacional
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34849573

RESUMO

Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.


Assuntos
Dobramento de Proteína , Proteínas , Algoritmos , Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Proteínas/química
3.
J Chem Inf Model ; 64(1): 76-95, 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38109487

RESUMO

Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.


Assuntos
Inteligência Artificial , Descoberta de Drogas , Dobramento de Proteína , Projetos de Pesquisa
4.
Nucleic Acids Res ; 50(W1): W235-W245, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35536281

RESUMO

Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.


Assuntos
Algoritmos , Aprendizado Profundo , Proteínas/química , Conformação Proteica , Biologia Computacional/métodos , Software
5.
Bioinformatics ; 38(19): 4513-4521, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35962986

RESUMO

MOTIVATION: With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS: In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION: http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Humanos , Software , Proteínas/química , Bases de Dados de Proteínas , Domínios Proteicos
6.
Bioinformatics ; 37(23): 4357-4365, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34245242

RESUMO

MOTIVATION: Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS: A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION: The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Proteínas , Conformação Proteica , Proteínas/química , Software , Benchmarking
7.
IEEE Trans Evol Comput ; 24(3): 536-550, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33603321

RESUMO

Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.

8.
Interdiscip Sci ; 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38190097

RESUMO

The breakthrough of AlphaFold2 and the publication of AlphaFold DB represent a significant advance in the field of predicting static protein structures. However, AlphaFold2 models tend to represent a single static structure, and multiple-conformation prediction remains a challenge. In this work, we proposed a method named MultiSFold, which uses a distance-based multi-objective evolutionary algorithm to predict multiple conformations. To begin, multiple energy landscapes are constructed using different competing constraints generated by deep learning. Subsequently, an iterative modal exploration and exploitation strategy is designed to sample conformations, incorporating multi-objective optimization, geometric optimization and structural similarity clustering. Finally, the final population is generated using a loop-specific sampling strategy to adjust the spatial orientations. MultiSFold was evaluated against state-of-the-art methods using a benchmark set containing 80 protein targets, each characterized by two representative conformational states. Based on the proposed metric, MultiSFold achieves a remarkable success ratio of 56.25% in predicting multiple conformations, while AlphaFold2 only achieves 10.00%, which may indicate that conformational sampling combined with knowledge gained through deep learning has the potential to generate conformations spanning the range between different conformational states. In addition, MultiSFold was tested on 244 human proteins with low structural accuracy in AlphaFold DB to test whether it could further improve the accuracy of static structures. The experimental results demonstrate the performance of MultiSFold, with a TM-score better than that of AlphaFold2 by 2.97% and RoseTTAFold by 7.72%. The online server is at http://zhanglab-bioinf.com/MultiSFold .

9.
Artigo em Inglês | MEDLINE | ID: mdl-35594218

RESUMO

Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.


Assuntos
Algoritmos , Proteínas , Domínios Proteicos , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Análise por Conglomerados
10.
Artigo em Inglês | MEDLINE | ID: mdl-32750861

RESUMO

De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.


Assuntos
Proteínas , Software , Algoritmos , Modelos Moleculares , Conformação Proteica , Proteínas/genética
11.
Virol J ; 8: 465, 2011 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-21978576

RESUMO

This report describes a one-step real-time polymerase chain reaction assay based on SYBR Green I for detection of a broad range of duck circovirus (DuCV). Align with all DuCV complete genome sequences and other Genus Circovirus download from the GenBank (such as goose circovirus, pigeon circovirus), the primers targets to the replicate gene of DuCV were designed. The detection assay was linear in the range of 1.31 × 102-1.31 × 107 copies/µL. The reaction efficiency of the assay using the slope (the slope was -3.349) and the Y-intercept was 37.01 from the linear equation was estimated to be 0.99 and the correlation coefficient (R2) was 0.993. A series of experiments were carried out to assess the reproducibility, sensitivity, and specificity of the assay, following by the low intra-assay and inter-assay CVs for CT values obtained with the standard plasmids. The intra-assay CVs were equal or less than 1.89% and the inter-assay CVs were equal or less than 1.26%. There was no cross-reaction occurred with nucleic acids extracted from RA (Riemerella anatipestifer), E. coli (Escherichia coli), Duck Cholera (Pasteurella multocida), Avian influenza virus, avian paramyxovirus, Muscovy duck parvovirus, Duck reovirus, Duck hepatitis A virus as control templates. The nucleic acids extracted from samples of healthy ducks were used as negative controls. The assay was specific and reproducible. The established real time PCR was used to detect 45 DuCV-negative samples, which were tested using conventional PCR under the developed optimal conditions, each 15 for embryonated eggs, non-embryonated budgerigar eggs, newly hatched duck, the mixture of the lung, liver, spleen which were analysis for the presence of DuCV DNA, to conform that whether the DuCV can be transmitted vertically. Meanwhile, no positive result was shown by the real-time PCR method. The SYBR Green I-based quantitative PCR can therefore be practically used as an alternative diagnostic tool and a screening method for ducks infected with duck circovirus.


Assuntos
Infecções por Circoviridae/diagnóstico , Circovirus/genética , Patos/virologia , Doenças das Aves Domésticas/diagnóstico , Reação em Cadeia da Polimerase em Tempo Real/métodos , Animais , Sequência de Bases , Benzotiazóis , China , Infecções por Circoviridae/epidemiologia , Infecções por Circoviridae/veterinária , Infecções por Circoviridae/virologia , Primers do DNA/química , Primers do DNA/genética , Diaminas , Escherichia coli , Dados de Sequência Molecular , Compostos Orgânicos/análise , Compostos Orgânicos/química , Plasmídeos , Doenças das Aves Domésticas/epidemiologia , Doenças das Aves Domésticas/virologia , Quinolinas , Padrões de Referência , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Transformação Bacteriana
12.
Avian Dis ; 55(2): 311-8, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21793450

RESUMO

To investigate the genetic diversity and genotype of duck circovirus (DuCV), nine full-length DuCV genomes were determined from clinical samples. Multiple sequence alignment and phylogenetic analyses were performed on the nine viral genome sequences as well as on 27 genome sequences retrieved from the GenBank database. Pairwise analysis showed that the determined genome sequences have a genome organization identical to the 27 sequences and share 83.3%-99.8% identity among themselves and 82.6%-99.9% with the other 27 sequences. Phylogenetic analysis revealed that all 36 viral genome sequences are divided into two lineages, DuCV1 and DuCV2, in which the nucleotide diversity between genome sequences in these two lineages ranged from 13.2%-17.4%; these may be regarded as two types of viruses. Viruses under DuCV1 and DuCV2 are further clustered into different sublineages. When analyzed using the method for genotype definition proposed by Grau-Roma et al, these different sublineages can be defined as genotypes DuCV1a, DuCV1b, DuCV2a, DuCV2b, and DuCV2c. In addition, the viral sequences obtained from mainland China are different in genomic size and share a diversity of no less than 13.2%, including the sequences that came from all genotypes. This suggests that the DuCVs prevalent in domestic duck flocks in China are ecologically divergent.


Assuntos
Infecções por Circoviridae/veterinária , Circovirus/classificação , Circovirus/genética , Patos , Variação Genética , Doenças das Aves Domésticas/virologia , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Infecções por Circoviridae/virologia , Genoma Viral , Genótipo , Filogenia , Proteínas Virais/genética , Proteínas Virais/metabolismo
13.
Bing Du Xue Bao ; 28(2): 158-64, 2012 Mar.
Artigo em Zh | MEDLINE | ID: mdl-22519178

RESUMO

To demonstrate the phylogenetic evolution, the molecular characteristics of the motif of HA protein cleavage site and the varieties at the receptor binding sites of the hemagglutinin gene of the duck-origin H9N2 subtype avian influenza viruses, sequence alignment and phylogenetic analysis were performed by MEGA 4.1 Neighbor-Joining method.. The results revealed that the duck-origin H9N2 AIV viruses originated from CK/BJ/1/94-like and North-Ame-like, all the duck-origin H9N2 AIV viruses from mainland China belonged to CK/BJ/1/94-like and formed multiple genotypes through complicated re-assortment, while other duck-origin H9N2 AIV, isolated from other countries in Aisa, American and European such as Korea, Japan, Alberta, Austria, Switzerland, Iran, belonged to the North-Ame-like phylogenetic lineage. The amino acids at positions 183, 190, and 226 of the receptor binding sites of North-Ame-like group isolates had highly conserved H, E and Q respectively. In contrast with duck-origin H9N2 AIV viruses isolates from mainland China, the amino acids had N at positions 183, A, T, or V at 190, L or Q at 226, which was the same as the chicken-origin H9N2 AIV from mainland China. Most newly isolated chicken-origin H9N2 AIV in Fujian Province in Southern China had L at position 226 emphasized the higher risk of cross-infection between the chicken-origin and duck-origin H9N2 AIV in China.


Assuntos
Glicoproteínas de Hemaglutininação de Vírus da Influenza/química , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Vírus da Influenza A Subtipo H9N2/genética , Influenza Aviária/virologia , Doenças das Aves Domésticas/virologia , Animais , China , Patos , Vírus da Influenza A Subtipo H9N2/química , Vírus da Influenza A Subtipo H9N2/classificação , Vírus da Influenza A Subtipo H9N2/isolamento & purificação , Vírus da Influenza A/química , Vírus da Influenza A/classificação , Vírus da Influenza A/genética , Dados de Sequência Molecular , Filogenia , Alinhamento de Sequência
14.
Virol Sin ; 26(5): 289-96, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21979568

RESUMO

Duck circovirus (DuCV), a potential immunosuppressive virus, was investigated in Southern China from March 2006 to December 2009 by using a polymerase chain reaction (PCR) based method. In this study, a total of 138 sick or dead duck samples from 18 different farms were examined with an average DuCV infection rate of ∼35%. It was found that ducks between the ages of 40∼60 days were more susceptible to DuCV. There was no evidence showing that the DuCV virus was capable of vertical transmission. Farms with positive PCR results exhibited no regularly apparent clinical abnormalities such as feathering disorders, growth retardation or lower-than-average weight. The complete genomes of 9 strains from Fujian Province and 1 from Zhejiang Province were sequenced and analyzed. The 10 DuCV genomes, compared with others genomes downloaded from GenBank, ranged in size from 1988 to 1996 base pairs, with sequence identities ranging from 83.2% to 99.8%. Phylogenetic analysis based on genome sequences demonstrated that DuCVs can be divided into two distinct genetic genotypes, Group I (the Euro-USA lineage) and Group II (the Taiwan lineage), with approximately 10.0% genetic difference between the two types. Molecular epidemiological data suggest there is no obvious difference among DuCV strains isolated from different geographic locations or different species, including Duck, Muscovy duck, Mule duck, Cheery duck, Mulard duck and Pekin duck.


Assuntos
Infecções por Circoviridae/veterinária , Circovirus/classificação , Circovirus/genética , DNA Viral/genética , Genoma Viral , Doenças das Aves Domésticas/virologia , Animais , China , Infecções por Circoviridae/virologia , Circovirus/isolamento & purificação , Análise por Conglomerados , DNA Viral/química , Patos , Genótipo , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Homologia de Sequência
15.
Bing Du Xue Bao ; 24(2): 138-43, 2008 Jun.
Artigo em Zh | MEDLINE | ID: mdl-18533346

RESUMO

To reveal the molecular biological characteristics of genome of circovirus in infected ducks, two nucleotide fragments were amplified by overlapping PCRs using DNA extracted from various tissues of ducks. After they had been assembled together, the nucleotide components, the genome organization and the phylogenetic scale of the sequence were analyzed. The results showed that the obtained sequence is a circular DNA with a total length of 1995nt. It contains 6 open reading frames (ORFs), and shares a high identity of 97.4% with the MuDCV circovirus sequence presented in GenBank (AY228555). These results indicate that the amplified product stems from duck circovirus sequence.


Assuntos
Circovirus/genética , Patos/virologia , Animais , Sequência de Bases , Circovirus/classificação , Clonagem Molecular , Variação Genética , Dados de Sequência Molecular , Filogenia , Reação em Cadeia da Polimerase , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA