Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Nat Commun ; 12(1): 5011, 2021 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-34408149

RESUMO

Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Bases de Dados de Proteínas , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Dobramento de Proteína , Proteínas/genética , Software
2.
Int J Mol Sci ; 22(13)2021 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-34209110

RESUMO

Positively charged groups that mimic arginine or lysine in a natural substrate of trypsin are necessary for drugs to inhibit the trypsin-like serine protease TMPRSS2 that is involved in the viral entry and spread of coronaviruses, including SARS-CoV-2. Based on this assumption, we identified a set of 13 approved or clinically investigational drugs with positively charged guanidinobenzoyl and/or aminidinobenzoyl groups, including the experimentally verified TMPRSS2 inhibitors Camostat and Nafamostat. Molecular docking using the C-I-TASSER-predicted TMPRSS2 catalytic domain model suggested that the guanidinobenzoyl or aminidinobenzoyl group in all the drugs could form putative salt bridge interactions with the side-chain carboxyl group of Asp435 located in the S1 pocket of TMPRSS2. Molecular dynamics simulations further revealed the high stability of the putative salt bridge interactions over long-time (100 ns) simulations. The molecular mechanics/generalized Born surface area-binding free energy assessment and per-residue energy decomposition analysis also supported the strong binding interactions between TMPRSS2 and the proposed drugs. These results suggest that the proposed compounds, in addition to Camostat and Nafamostat, could be effective TMPRSS2 inhibitors for COVID-19 treatment by occupying the S1 pocket with the hallmark positively charged groups.


Assuntos
Antivirais/química , Serina Endopeptidases/metabolismo , Inibidores de Serino Proteinase/química , Antivirais/metabolismo , Antivirais/uso terapêutico , Benzamidinas/química , Benzamidinas/metabolismo , Sítios de Ligação , COVID-19/tratamento farmacológico , COVID-19/patologia , COVID-19/virologia , Domínio Catalítico , Ésteres/química , Ésteres/metabolismo , Guanidinas/química , Guanidinas/metabolismo , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Serina Endopeptidases/química , Inibidores de Serino Proteinase/metabolismo , Inibidores de Serino Proteinase/uso terapêutico , Termodinâmica
3.
Proteins ; 2021 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-34331351

RESUMO

In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.

4.
J Biol Chem ; 297(1): 100870, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34119522

RESUMO

Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.


Assuntos
Conformação Proteica , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Aprendizado de Máquina , Simulação de Dinâmica Molecular
5.
Curr Opin Struct Biol ; 68: 194-207, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33639355

RESUMO

Protein structure prediction and design can be regarded as two inverse processes governed by the same folding principle. Although progress remained stagnant over the past two decades, the recent application of deep neural networks to spatial constraint prediction and end-to-end model training has significantly improved the accuracy of protein structure prediction, largely solving the problem at the fold level for single-domain proteins. The field of protein design has also witnessed dramatic improvement, where noticeable examples have shown that information stored in neural-network models can be used to advance functional protein design. Thus, incorporation of deep learning techniques into different steps of protein folding and design approaches represents an exciting future direction and should continue to have a transformative impact on both fields.

6.
J Phys Chem B ; 125(2): 528-538, 2021 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-33397114

RESUMO

The rigid-body fitting of predicted structural models into cryo-electron microscopy (cryo-EM) density maps is a necessary procedure for density map-guided protein structure determination and prediction. We proposed a novel multiobjective optimization protocol, MOFIT, which performs a rigid-body density-map fitting based on particle swarm optimization (PSO). MOFIT was tested on a large set of 292 nonhomologous single-domain proteins. Starting from structural models predicted by I-TASSER, MOFIT achieved an average coordinate root-mean-square deviation of 2.46 Å, which was 1.57, 2.79, and 3.95 Å lower than three leading single-objective function-based methods, where the differences were statistically significant with p-values of 1.65 × 10-6, 6.36 × 10-8, and 6.44 × 10-11 calculated using two-tail Student's t tests. Detailed analyses showed that the major advantages of MOFIT lie in the multiobjective protocol and the extensive PSO search simulations guided by the composite objective functions, which integrates complementary correlation coefficients from the global structure, local fragments, and individual residues with the cryo-EM density maps.


Assuntos
Proteínas , Microscopia Crioeletrônica , Humanos , Modelos Moleculares , Conformação Proteica
7.
Soc Psychiatry Psychiatr Epidemiol ; 56(4): 605-617, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32915245

RESUMO

PURPOSE: There are well-established associations between parental/peer relationships and adolescent substance use, but few longitudinal studies have examined whether adolescents change their substance use in response to changes in their parents' behavior or peer networks. We employ a within-person change approach to address two key questions: Are changes in parenting and peer factors associated with changes in adolescent marijuana and alcohol use? Are there sensitive periods when changes in parenting and peer factors are more strongly associated with changes in adolescent marijuana and alcohol use? METHODS: We analyzed longitudinal data collected annually on 503 boys, ages 13-19, recruited from Pittsburgh public schools. Questionnaires regarding parental supervision, negative parenting practices, parental stress, physical punishment, peer delinquency, and peer drug use were administered to adolescents and their caretakers. Alcohol and marijuana use were assessed by a substance use scale adapted from the National Youth Survey. RESULTS: Reductions in parental supervision and increases in peer drug use and peer delinquency were associated with increases in marijuana frequency, alcohol frequency, and alcohol quantity. Increases in parental stress were associated with increases in marijuana and alcohol frequency. The magnitudes of these relationships were strongest at ages 14-15 and systematically decreased across adolescence. These associations were not due to unmeasured stable confounders or measured time-varying confounders. CONCLUSIONS: Reducing or mitigating changes in parenting and peer risk factors in early adolescence may be particularly important for preventing substance use problems as adolescents transition into young adulthood.


Assuntos
Comportamento do Adolescente , Fumar Maconha , Uso da Maconha , Transtornos Relacionados ao Uso de Substâncias , Adolescente , Adulto , Consumo de Bebidas Alcoólicas/epidemiologia , Humanos , Masculino , Fumar Maconha/epidemiologia , Uso da Maconha/epidemiologia , Poder Familiar , Grupo Associado , Adulto Jovem
8.
J Proteome Res ; 19(12): 4844-4856, 2020 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-33175551

RESUMO

Despite considerable research progress on SARS-CoV-2, the direct zoonotic origin (intermediate host) of the virus remains ambiguous. The most definitive approach to identify the intermediate host would be the detection of SARS-CoV-2-like coronaviruses in wild animals. However, due to the high number of animal species, it is not feasible to screen all the species in the laboratory. Given that binding to ACE2 proteins is the first step for the coronaviruses to invade host cells, we propose a computational pipeline to identify potential intermediate hosts of SARS-CoV-2 by modeling the binding affinity between the Spike receptor-binding domain (RBD) and host ACE2. Using this pipeline, we systematically examined 285 ACE2 variants from mammals, birds, fish, reptiles, and amphibians, and found that the binding energies calculated for the modeled Spike-RBD/ACE2 complex structures correlated closely with the effectiveness of animal infection as determined by multiple experimental data sets. Built on the optimized binding affinity cutoff, we suggest a set of 96 mammals, including 48 experimentally investigated ones, which are permissive to SARS-CoV-2, with candidates from primates, rodents, and carnivores at the highest risk of infection. Overall, this work not only suggests a limited range of potential intermediate SARS-CoV-2 hosts for further experimental investigation, but also, more importantly, it proposes a new structure-based approach to general zoonotic origin and susceptibility analyses that are critical for human infectious disease control and wildlife protection.


Assuntos
Enzima de Conversão de Angiotensina 2/genética , COVID-19/genética , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Animais , Sítios de Ligação/genética , COVID-19/patologia , COVID-19/virologia , Interações Hospedeiro-Patógeno/genética , Humanos , Mamíferos/genética , Mamíferos/virologia , Pandemias , Ligação Proteica/genética , Domínios Proteicos/genética , SARS-CoV-2/patogenicidade , Zoonoses Virais/genética , Zoonoses Virais/virologia
9.
bioRxiv ; 2020 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-32935105

RESUMO

Despite considerable research progress on SARS-CoV-2, the direct zoonotic origin (intermediate host) of the virus remains ambiguous. The most definitive approach to identify the intermediate host would be the detection of SARS-CoV-2-like coronaviruses in wild animals. However, due to the high number of animal species, it is not feasible to screen all the species in the laboratory. Given that the recognition of the binding ACE2 proteins is the first step for the coronaviruses to invade host cells, we proposed a computational pipeline to identify potential intermediate hosts of SARS-CoV-2 by modeling the binding affinity between the Spike receptor-binding domain (RBD) and host ACE2. Using this pipeline, we systematically examined 285 ACE2 variants from mammals, birds, fish, reptiles, and amphibians, and found that the binding energies calculated on the modeled Spike-RBD/ACE2 complex structures correlate closely with the effectiveness of animal infections as determined by multiple experimental datasets. Built on the optimized binding affinity cutoff, we suggested a set of 96 mammals, including 48 experimentally investigated ones, which are permissive to SARS-CoV-2, with candidates from primates, rodents, and carnivores at the highest risk of infection. Overall, this work not only suggested a limited range of potential intermediate SARS-CoV-2 hosts for further experimental investigation; but more importantly, it proposed a new structure-based approach to general zoonotic origin and susceptibility analyses that are critical for human infectious disease control and wildlife protection.

10.
bioRxiv ; 2020 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-32817949

RESUMO

The current COVID-19 pandemic caused by SARS-CoV-2 has resulted in millions of confirmed cases and thousands of deaths globally. Extensive efforts and progress have been made to develop effective and safe vaccines against COVID-19. A primary target of these vaccines is the SARS-CoV-2 spike (S) protein, and many studies utilized structural vaccinology techniques to either stabilize the protein or fix the receptor-binding domain at certain states. In this study, we extended an evolutionary protein design algorithm, EvoDesign, to create thousands of stable S protein variants without perturbing the surface conformation and B cell epitopes of the S protein. We then evaluated the mutated S protein candidates based on predicted MHC-II T cell promiscuous epitopes as well as the epitopes' similarity to human peptides. The presented strategy aims to improve the S protein's immunogenicity and antigenicity by inducing stronger CD4 T cell response while maintaining the protein's native structure and function. The top EvoDesign S protein candidate (Design-10705) recovered 31 out of 32 MHC-II T cell promiscuous epitopes in the native S protein, in which two epitopes were present in all seven human coronaviruses. This newly designed S protein also introduced nine new MHC-II T cell promiscuous epitopes and showed high structural similarity to its native conformation. The proposed structural vaccinology method provides an avenue to rationally design the antigen's structure with increased immunogenicity, which could be applied to the rational design of new COVID-19 vaccine candidates.

11.
J Mol Biol ; 432(19): 5365-5377, 2020 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-32771523

RESUMO

The rapid progress of cryo-electron microscopy (cryo-EM) in structural biology has raised an urgent need for robust methods to create and refine atomic-level structural models using low-resolution EM density maps. We propose a new protocol to create initial models using I-TASSER protein structure prediction, followed by EM density map-based rigid-body structure fitting, flexible fragment adjustment and atomic-level structure refinement simulations. The protocol was tested on a large set of 285 non-homologous proteins and generated structural models with correct folds for 260 proteins, where 28% had RMSDs below 2 Å. Compared to other state-of-the-art methods, the major advantage of the proposed pipeline lies in the uniform structure prediction and refinement protocol, as well as the extensive structural re-assembly simulations, which allow for low-to-medium resolution EM density map-guided structure modeling starting from amino acid sequences. Interestingly, the quality of both the image fitting and subsequent structure refinement was found to be strongly correlated with the correctness of the initial I-TASSER models; this is mainly due to the different correlation patterns observed between force field and structural quality for the models with template modeling score (or TM-score, a metric quantifying the similarity of models to the native) above and below a threshold of 0.5. Overall, the results demonstrate a new avenue that is ready to use for large-scale cryo-EM-based structure modeling and atomic-level density map-guided structure refinement.


Assuntos
Microscopia Crioeletrônica/métodos , Proteínas/ultraestrutura , Algoritmos , Animais , Humanos , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Proteínas/química
12.
Aging (Albany NY) ; 12(12): 11263-11276, 2020 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-32544884

RESUMO

The outbreak of COVID-19 has now become a global pandemic that has severely impacted lives and economic stability. There is, however, no effective antiviral drug that can be used to treat COVID-19 to date. Built on the fact that SARS-CoV-2 initiates its entry into human cells by the receptor binding domain (RBD) of its spike protein binding to the angiotensin-converting enzyme 2 (hACE2), we extended a recently developed approach, EvoDesign, to design multiple peptide sequences that can competitively bind to the SARS-CoV-2 RBD to inhibit the virus from entering human cells. The protocol starts with the construction of a hybrid peptidic scaffold by linking two fragments grafted from the interface of the hACE2 protein (a.a. 22-44 and 351-357) with a linker glycine, which is followed by the redesign and refinement simulations of the peptide sequence to optimize its binding affinity to the interface of the SARS-CoV-2 RBD. The binding experiment analyses showed that the designed peptides exhibited a significantly stronger binding potency to hACE2 than the wild-type hACE2 receptor (with -53.35 vs. -46.46 EvoEF2 energy unit scores for the top designed and wild-type peptides, respectively). This study demonstrates a new avenue to utilize computationally designed peptide motifs to treat the COVID-19 disease by blocking the critical spike-RBD and hACE2 interactions.


Assuntos
Infecções por Coronavirus/tratamento farmacológico , Peptídeos/síntese química , Peptídeos/farmacologia , Peptidil Dipeptidase A/fisiologia , Pneumonia Viral/tratamento farmacológico , Glicoproteína da Espícula de Coronavírus/fisiologia , Sequência de Aminoácidos , Enzima de Conversão de Angiotensina 2 , Antivirais , Sítios de Ligação , COVID-19 , Desenho de Fármacos , Evolução Molecular , Humanos , Modelos Moleculares , Pandemias , Ligação Proteica , Conformação Proteica , Internalização do Vírus/efeitos dos fármacos
13.
Arch Dis Child ; 105(12): 1208-1214, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32404437

RESUMO

BACKGROUND: WHO recommends simplified antibiotics for young infants with sepsis in countries where hospitalisation is not feasible. Amoxicillin provides safe, Gram-positive coverage. This study was done to determine pharmacokinetics, drug disposition and interpopulation variability of oral amoxicillin in this demographic. METHODS: Young infants with signs of sepsis enrolled in an oral amoxicillin/intramuscular gentamicin treatment arm of a sepsis trial in Karachi, Pakistan, were studied. Limited pharmacokinetic (PK) sampling was performed at 0, 2-3 and 6-8 hours following an index dose of oral amoxicillin. Plasma concentrations were determined by high-performance liquid chromatography/mass spectrometry. Values of ≥2 mg/L were considered as the effect threshold, given the regional minimal inhibitory concentration (MIC) of resistant Streptococcus pneumoniae. RESULTS: Amoxicillin concentrations were determined in 129 samples from 60 young infants. Six of 44 infants had positive blood cultures with predominant Gram-positive organisms. Forty-four infants contributing blood at ≥2 of 3 specified timepoints were included in the analysis. Mean amoxicillin levels at 2-3 hours (11.6±9.5 mg/L, n=44) and 6-8 hours (16.4±9.3 mg/L, n=20) following the index dose exceeded the MIC for amoxicillin (2.0 mg/L) against resistant S. pneumoniae strains. Of 20 infants with three serum levels, 7 showed a classic dose-exposure profile and 13 showed increasing concentrations with time, implying delayed absorption or excretion. CONCLUSION: Amoxicillin concentrations in sera of young infants following oral administration at 75-100 mg/kg/day daily divided doses exceeds the susceptibility breakpoint for >50% of a 12-hour dosing interval.Oral amoxicillin may hold potential as a safe replacement of parenteral ampicillin in newborn sepsis regimens, including aminoglycosides, where hospitalisation is not feasible. TRIAL REGISTRATION NUMBER: NCT01027429.


Assuntos
Amoxicilina/sangue , Amoxicilina/farmacocinética , Antibacterianos/sangue , Antibacterianos/farmacocinética , Sepse/tratamento farmacológico , Administração Oral , Amoxicilina/administração & dosagem , Antibacterianos/administração & dosagem , Quimioterapia Combinada , Feminino , Gentamicinas/administração & dosagem , Humanos , Lactente , Recém-Nascido , Injeções Intramusculares , Masculino , Testes de Sensibilidade Microbiana , Streptococcus pneumoniae/efeitos dos fármacos , Fatores de Tempo
14.
Bioinformatics ; 36(12): 3749-3757, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32227201

RESUMO

MOTIVATION: Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. RESULTS: We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. AVAILABILITY AND IMPLEMENTATION: https://zhanglab.ccmb.med.umich.edu/FUpred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Algoritmos , Biologia Computacional , Redes Neurais de Computação , Domínios Proteicos , Software
15.
Bioinformatics ; 36(12): 3758-3765, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32259206

RESUMO

MOTIVATION: Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. RESULTS: We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. AVAILABILITY AND IMPLEMENTATION: The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Algoritmos
16.
Bioinformatics ; 36(4): 1135-1142, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31588495

RESUMO

MOTIVATION: The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs. RESULTS: We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein-protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. AVAILABILITY AND IMPLEMENTATION: The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional
17.
J Chem Inf Model ; 60(1): 410-420, 2020 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-31851497

RESUMO

Protein rotamers refer to the conformational isomers taken by the side-chains of amino acids to accommodate specific structural folding environments. Since accurate modeling of atomic interactions is difficult, rotamer information collected from experimentally solved protein structures is often used to guide side-chain packing in protein folding and sequence design studies. Many rotamer libraries have been built in the literature but there is little quantitative guidance on which libraries should be chosen for different structural modeling studies. Here, we performed a comparative study of six widely used rotamer libraries and systematically examined their suitability for protein folding and sequence design in four aspects: (1) side-chain match accuracy, (2) side-chain conformation prediction, (3) de novo protein sequence design, and (4) computational time cost. We demonstrated that, compared to the backbone-dependent rotamer libraries (BBDRLs), the backbone-independent rotamer libraries (BBIRLs) generated conformations that more closely matched the native conformations due to the larger number of rotamers in the local rotamer search spaces. However, more practically, using an optimized physical energy function incorporated into a simulated annealing Monte Carlo searching scheme, we showed that utilization of the BBDRLs could result in higher accuracies in side-chain prediction and higher sequence recapitulation rates in protein design experiments. Detailed data analyses showed that the major advantage of BBDRLs lies in the energy term derived from the rotamer probabilities that are associated with the individual backbone torsion angle subspaces. This term is important for distinguishing between amino acid identities as well as the rotamer conformations of an amino acid. Meanwhile, the backbone torsion angle subspace-specific rotamer search drastically speeds up the searching time, despite the significantly larger number of total rotamers in the BBDRLs. These results should provide important guidance for the development and selection of rotamer libraries for practical protein design and structure prediction studies.


Assuntos
Proteínas/química , Aminoácidos/química , Modelos Moleculares , Conformação Proteica
18.
Bioinformatics ; 36(8): 2429-2437, 2020 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-31830252

RESUMO

MOTIVATION: Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein-protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. RESULTS: We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. AVAILABILITY AND IMPLEMENTATION: Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Humanos , Mutação , Ligação Proteica , Processamento de Proteína Pós-Traducional , Proteínas/genética
19.
Artigo em Inglês | MEDLINE | ID: mdl-33398234

RESUMO

The development of effective and safe vaccines is the ultimate way to efficiently stop the ongoing COVID-19 pandemic, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Built on the fact that SARS-CoV-2 utilizes the association of its Spike (S) protein with the human Angiotensin-converting enzyme 2 (ACE2) receptor to invade host cells, we computationally redesigned the S protein sequence to improve its immunogenicity and antigenicity. Toward this purpose, we extended an evolutionary protein design algorithm, EvoDesign, to create thousands of stable S protein variants that perturb the core protein sequence but keep the surface conformation and B cell epitopes. The T cell epitope content and similarity scores of the perturbed sequences were calculated and evaluated. Out of 22,914 designs with favorable stability energy, 301 candidates contained at least two pre-existing immunity-related epitopes and had promising immunogenic potential. The benchmark tests showed that, although the epitope restraints were not included in the scoring function of EvoDesign, the top S protein design successfully recovered 31 out of the 32 major histocompatibility complex (MHC) -II T cell promiscuous epitopes in the native S protein, where two epitopes were present in all seven human coronaviruses. Moreover, the newly designed S protein introduced nine new MHC-II T cell promiscuous epitopes that do not exist in the wildtype SARS-CoV-2. These results demonstrated a new and effective avenue to enhance a target protein's immunogenicity using rational protein design, which could be applied for new vaccine design against COVID-19 and other human viruses.

20.
PLoS Comput Biol ; 15(10): e1007411, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31622328

RESUMO

Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the most accurate method for protein structure prediction. Due to the difficulty in recognizing distant-homology templates, however, the accuracy of TBM decreases rapidly when the evolutionary relationship between the query and template vanishes. In this study, we propose a new method, CEthreader, which first predicts residue-residue contacts by coupling evolutionary precision matrices with deep residual convolutional neural-networks. The predicted contact maps are then integrated with sequence profile alignments to recognize structural templates from the PDB. The method was tested on two independent benchmark sets consisting collectively of 1,153 non-homologous protein targets, where CEthreader detected 176% or 36% more correct templates with a TM-score >0.5 than the best state-of-the-art profile- or contact-based threading methods, respectively, for the Hard targets that lacked homologous templates. Moreover, CEthreader was able to identify 114% or 20% more correct templates with the same Fold as the query, after excluding structures from the same SCOPe Superfamily, than the best profile- or contact-based threading methods. Detailed analyses show that the major advantage of CEthreader lies in the efficient coupling of contact maps with profile alignments, which helps recognize global fold of protein structures when the homologous relationship between the query and template is weak. These results demonstrate an efficient new strategy to combine ab initio contact map prediction with profile alignments to significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.


Assuntos
Rede Nervosa/fisiologia , Análise de Sequência de Proteína/métodos , Homologia Estrutural de Proteína , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Biológicos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...