Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 178
Filtrar
1.
iScience ; 25(5): 104269, 2022 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-35542046

RESUMO

Large offspring syndrome (LOS) and Beckwith-Wiedemann syndrome are similar epigenetic congenital overgrowth conditions in ruminants and humans, respectively. We have reported global loss-of-imprinting, methylome epimutations, and gene misregulation in LOS. However, less than 4% of gene misregulation can be explained with short range (<20kb) alterations in DNA methylation. Therefore, we hypothesized that methylome epimutations in LOS affect chromosome architecture which results in misregulation of genes located at distances >20kb in cis and in trans (other chromosomes). Our analyses focused on two imprinted domains that frequently reveal misregulation in these syndromes, namely KvDMR1 and IGF2R. Using bovine fetal fibroblasts, we identified CTCF binding at IGF2R imprinting control region but not KvDMR1, and allele-specific chromosome architecture of these domains in controls. In LOS, analyses identified erroneous long-range contacts and clustering tendency in the direction of expression of misregulated genes. In conclusion, altered chromosome architecture is associated with LOS.

2.
Commun Biol ; 5(1): 460, 2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35562408

RESUMO

Different intensities of high temperatures affect the growth of photosynthetic cells in nature. To elucidate the underlying mechanisms, we cultivated the unicellular green alga Chlamydomonas reinhardtii under highly controlled photobioreactor conditions and revealed systems-wide shared and unique responses to 24-hour moderate (35°C) and acute (40°C) high temperatures and subsequent recovery at 25°C. We identified previously overlooked unique elements in response to moderate high temperature. Heat at 35°C transiently arrested the cell cycle followed by partial synchronization, up-regulated transcripts/proteins involved in gluconeogenesis/glyoxylate-cycle for carbon uptake and promoted growth. But 40°C disrupted cell division and growth. Both high temperatures induced photoprotection, while 40°C distorted thylakoid/pyrenoid ultrastructure, affected the carbon concentrating mechanism, and decreased photosynthetic efficiency. We demonstrated increased transcript/protein correlation during both heat treatments and hypothesize reduced post-transcriptional regulation during heat may help efficiently coordinate thermotolerance mechanisms. During recovery after both heat treatments, especially 40°C, transcripts/proteins related to DNA synthesis increased while those involved in photosynthetic light reactions decreased. We propose down-regulating photosynthetic light reactions during DNA replication benefits cell cycle resumption by reducing ROS production. Our results provide potential targets to increase thermotolerance in algae and crops.

3.
BMC Bioinformatics ; 23(Suppl 3): 141, 2022 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-35439931

RESUMO

BACKGROUND: Estimation of the accuracy (quality) of protein structural models is important for both prediction and use of protein structural models. Deep learning methods have been used to integrate protein structure features to predict the quality of protein models. Inter-residue distances are key information for predicting protein's tertiary structures and therefore have good potentials to predict the quality of protein structural models. However, few methods have been developed to fully take advantage of predicted inter-residue distance maps to estimate the accuracy of a single protein structural model. RESULT: We developed an attentive 2D convolutional neural network (CNN) with channel-wise attention to take only a raw difference map between the inter-residue distance map calculated from a single protein model and the distance map predicted from the protein sequence as input to predict the quality of the model. The network comprises multiple convolutional layers, batch normalization layers, dense layers, and Squeeze-and-Excitation blocks with attention to automatically extract features relevant to protein model quality from the raw input without using any expert-curated features. We evaluated DISTEMA's capability of selecting the best models for CASP13 targets in terms of ranking loss of GDT-TS score. The ranking loss of DISTEMA is 0.079, lower than several state-of-the-art single-model quality assessment methods. CONCLUSION: This work demonstrates that using raw inter-residue distance information with deep learning can predict the quality of protein structural models reasonably well. DISTEMA is freely at https://github.com/jianlin-cheng/DISTEMA.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Redes Neurais de Computação , Proteínas/química
5.
Bioinformatics ; 2022 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-35134816

RESUMO

MOTIVATION: Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue-residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue-residue contacts in homodimers from residue-residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue-residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. RESULTS: Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset, and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10%, and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers. AVAILABILITY: The source code of DRCon is available at https://github.com/jianlin-cheng/DRCon.

6.
Plant J ; 110(1): 193-211, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34997647

RESUMO

The non-essential supernumerary maize (Zea mays) B chromosome (B) has recently been shown to contain active genes and to be capable of impacting gene expression of the A chromosomes. However, the effect of the B chromosome on gene expression is still unclear. In addition, it is unknown whether the accumulation of the B chromosome has a cumulative effect on gene expression. To examine these questions, the global expression of genes, microRNAs (miRNAs), and transposable elements (TEs) of leaf tissue of maize W22 plants with 0-7 copies of the B chromosome was studied. All experimental genotypes with B chromosomes displayed a trend of upregulated gene expression for a subset of A-located genes compared to the control. Over 3000 A-located genes are significantly differentially expressed in all experimental genotypes with the B chromosome relative to the control. Modulations of these genes are largely determined by the presence rather than the copy number of the B chromosome. By contrast, the expression of most B-located genes is positively correlated with B copy number, showing a proportional gene dosage effect. The B chromosome also causes increased expression of A-located miRNAs. Differentially expressed miRNAs potentially regulate their targets in a cascade of effects. Furthermore, the varied copy number of the B chromosome leads to the differential expression of A-located and B-located TEs. The findings provide novel insights into the function and properties of the B chromosome.


Assuntos
Cromossomos de Plantas , Zea mays , Aneuploidia , Cromossomos de Plantas/genética , Elementos de DNA Transponíveis/genética , Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Zea mays/genética
7.
Proteins ; 90(3): 720-731, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34716620

RESUMO

Predicting the quaternary structure of protein complex is an important problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter-chain contacts as restraints, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score ≥ 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a Critical Assessment of Techniques for Protein Structure Prediction-Critical Assessments of Predictions of Interactions dataset.


Assuntos
Proteínas/química , Biologia Computacional , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Método de Monte Carlo , Ligação Proteica , Multimerização Proteica , Estrutura Quaternária de Proteína
8.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34849575

RESUMO

New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.


Assuntos
Antivirais , COVID-19 , Aprendizado Profundo , Descoberta de Drogas , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Antivirais/química , Antivirais/farmacocinética , COVID-19/tratamento farmacológico , COVID-19/metabolismo , Humanos , Ligantes
9.
Proteins ; 90(1): 58-72, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34291486

RESUMO

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.


Assuntos
Aprendizado Profundo , Modelos Moleculares , Proteínas/química , Algoritmos , Humanos , Estrutura Terciária de Proteína/genética , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína , Software
10.
Plant Signal Behav ; 17(1): 2010389, 2022 12 31.
Artigo em Inglês | MEDLINE | ID: mdl-34951328

RESUMO

Anthocyanins are natural colorants are synthesized in a branch of the flavonoid pathway. Dihydroflavonol-4reductase (DFR) catalyzes dihydroflavonoids into anthocyanins biosynthesis, which is a key regulatory enzyme of anthocyanin biosynthesis in plants. Hosta ventricosa is an ornamental plant with elegant flowers and rich colorful leaves. How the function of HvDFR contributes to the anthocyanins biosynthesis is still unknown. In this study, the DFR homolog was identified from H. ventricosa and sequence analysis showed that HvDFR possessed the conserved NADPH binding and catalytic domains. A phylogenetic analysis showed that HvDFR was close to the clade formed with MaDFR and HoDFR in Asparagaceae. Gene expression analysis revealed that HvDFR was constitutive expressed in all tissues and expressed highly in flower as well as was positively correlated with anthocyanin content. In addition, the subcellular location of HvDFR showed that is in the nucleus and cell membrane. Overexpression of HvDFR in transgenic tobacco lines enhanced the anthocyanins accumulation along with the key genes upregulated, such as F3H, F3'H, ANS, and UFGT. Our results indicated a functional activity of the HvDFR, which provide an insight into the regulation of anthocyanins content in H. ventricosa.


Assuntos
Antocianinas , Hosta , Antocianinas/metabolismo , Flores/metabolismo , Regulação da Expressão Gênica de Plantas/genética , Hosta/metabolismo , Filogenia , Proteínas de Plantas/metabolismo
11.
Front Mol Biosci ; 8: 716973, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34497831

RESUMO

Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html.

12.
Int J Mol Sci ; 22(18)2021 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-34575948

RESUMO

Chromatin conformation plays an important role in a variety of genomic processes, including genome replication, gene expression, and gene methylation. Hi-C data is frequently used to analyze structural features of chromatin, such as AB compartments, topologically associated domains, and 3D structural models. Recently, the genomics community has displayed growing interest in chromatin dynamics. Here, we present 4DMax, a novel method, which uses time-series Hi-C data to predict dynamic chromosome conformation. Using both synthetic data and real time-series Hi-C data from processes, such as induced pluripotent stem cell reprogramming and cardiomyocyte differentiation, we construct smooth four-dimensional models of individual chromosomes. These predicted 4D models effectively interpolate chromatin position across time, permitting prediction of unknown Hi-C contact maps at intermittent time points. Furthermore, 4DMax correctly recovers higher order features of chromatin, such as AB compartments and topologically associated domains, even at time points where Hi-C data is not made available to the algorithm. Contact map predictions made using 4DMax outperform naïve numerical interpolation in 87.7% of predictions on the induced pluripotent stem cell dataset. A/B compartment profiles derived from 4DMax interpolation showed higher similarity to ground truth than at least one profile generated from a neighboring time point in 100% of induced pluripotent stem cell experiments. Use of 4DMax may alleviate the cost of expensive Hi-C experiments by interpolating intermediary time points while also providing valuable visualization of dynamic chromatin changes.


Assuntos
Cromatina/ultraestrutura , Cromossomos/ultraestrutura , Biologia Computacional , Algoritmos , Cromatina/genética , Cromossomos/genética , Genoma/genética , Humanos , Conformação Molecular
13.
Chromosome Res ; 29(3-4): 313-325, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34406545

RESUMO

The B chromosome of maize undergoes nondisjunction at the second pollen mitosis as part of its accumulation mechanism. Previous work identified 9-Bic-1 (9-B inactivated centromere-1), which comprises an epigenetically silenced B chromosome centromere that was translocated to the short arm of chromosome 9(9S). This chromosome is stable in isolation, but when normal B chromosomes are added to the genotype, it will attempt to undergo nondisjunction during the second pollen mitosis and usually fractures the chromosome in 9S. These broken chromosomes allow a test of whether the inactive centromere is reactivated or whether a de novo centromere is formed elsewhere on the chromosome to allow recovery of fragments. Breakpoint determination on the B chromosome and chromosome 9 showed that mini chromosome B1104 has the same breakpoint as 9-Bic-1 in the B centromere region and includes a portion of 9S. CENH3 binding was found on the B centromere region and on 9S, suggesting both centromere reactivation and de novo centromere formation. Another mini chromosome, B496, showed evidence of rearrangement, but it also only showed evidence for a de novo centromere. Other mini chromosome fragments recovered were directly derived from the B chromosome with breakpoints concentrated near the centromeric knob region, which suggests that the B chromosome is broken at a low frequency due to the failure of the sister chromatids to separate at the second pollen mitosis. Our results indicate that both reactivation and de novo centromere formation could occur on fragments derived from the progenitor possessing an inactive centromere.


Assuntos
Cromossomos de Plantas , Zea mays , Centrômero/genética , Aberrações Cromossômicas , Cromossomos de Plantas/genética , Mitose , Zea mays/genética
14.
Proteins ; 89(12): 1800-1823, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34453465

RESUMO

We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Proteínas , Software , Sítios de Ligação , Simulação de Acoplamento Molecular , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
15.
Proc Natl Acad Sci U S A ; 118(23)2021 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-34088847

RESUMO

B chromosomes are enigmatic elements in thousands of plant and animal genomes that persist in populations despite being nonessential. They circumvent the laws of Mendelian inheritance but the molecular mechanisms underlying this behavior remain unknown. Here we present the sequence, annotation, and analysis of the maize B chromosome providing insight into its drive mechanism. The sequence assembly reveals detailed locations of the elements involved with the cis and trans functions of its drive mechanism, consisting of nondisjunction at the second pollen mitosis and preferential fertilization of the egg by the B-containing sperm. We identified 758 protein-coding genes in 125.9 Mb of B chromosome sequence, of which at least 88 are expressed. Our results demonstrate that transposable elements in the B chromosome are shared with the standard A chromosome set but multiple lines of evidence fail to detect a syntenic genic region in the A chromosomes, suggesting a distant origin. The current gene content is a result of continuous transfer from the A chromosomal complement over an extended evolutionary time with subsequent degradation but with selection for maintenance of this nonvital chromosome.


Assuntos
Cromossomos de Plantas/genética , Evolução Molecular , Pólen/genética , Proteínas da Gravidez/genética , Zea mays/genética , Meiose/genética , Mitose/genética
17.
Sci Rep ; 11(1): 12295, 2021 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-34112907

RESUMO

Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.


Assuntos
Conformação Proteica , Proteínas/genética , Alinhamento de Sequência/métodos , Software , Algoritmos , Sequência de Aminoácidos/genética , Biologia Computacional , Aprendizado Profundo , Proteínas/ultraestrutura , Análise de Sequência de Proteína
18.
Sci Rep ; 11(1): 13155, 2021 06 23.
Artigo em Inglês | MEDLINE | ID: mdl-34162922

RESUMO

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system-MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .


Assuntos
Aprendizado Profundo , Estrutura Terciária de Proteína , Software , Sequência de Aminoácidos , Modelos Moleculares , Alinhamento de Sequência , Relação Estrutura-Atividade
19.
Sci Rep ; 11(1): 10943, 2021 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-34035363

RESUMO

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.


Assuntos
Caspases/metabolismo , Biologia Computacional , Aprendizado Profundo , Modelos Moleculares , Conformação Proteica
20.
Bioinformatics ; 2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-33961009

RESUMO

MOTIVATION: Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions (i.e. classifying distances between two residues into two categories: in contact (< 8 Angstrom) and not in contact otherwise) and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY: The software package, source code, and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...