RESUMEN
BACKGROUND: Protein engineering aims to improve the functional properties of existing proteins to meet people's needs. Current deep learning-based models have captured evolutionary, functional, and biochemical features contained in amino acid sequences. However, the existing generative models need to be improved when capturing the relationship between amino acid sites on longer sequences. At the same time, the distribution of protein sequences in the homologous family has a specific positional relationship in the latent space. We want to use this relationship to search for new variants directly from the vicinity of better-performing varieties. RESULTS: To improve the representation learning ability of the model for longer sequences and the similarity between the generated sequences and the original sequences, we propose a temporal variational autoencoder (T-VAE) model. T-VAE consists of an encoder and a decoder. The encoder expands the receptive field of neurons in the network structure by dilated causal convolution, thereby improving the encoding representation ability of longer sequences. The decoder decodes the sampled data into variants closely resembling the original sequence. CONCLUSION: Compared to other models, the person correlation coefficient between the predicted values of protein fitness obtained by T-VAE and the truth values was higher, and the mean absolute deviation was lower. In addition, the T-VAE model has a better representation learning ability for longer sequences when comparing the encoding of protein sequences of different lengths. These results show that our model has more advantages in representation learning for longer sequences. To verify the model's generative effect, we also calculate the sequence identity between the generated data and the input data. The sequence identity obtained by T-VAE improved by 12.9% compared to the baseline model.
Asunto(s)
Aminoácidos , Evolución Biológica , Humanos , Proteínas Mutantes , Secuencia de Aminoácidos , AprendizajeRESUMEN
Members of the pentatricopeptide repeat (PPR) protein family are sequence-specific RNA-binding proteins that play crucial roles in organelle RNA metabolism. Each PPR protein consists of a tandem array of PPR motifs, each of which aligns to one nucleotide of the RNA target. The di-residues in the PPR motif, which are referred to as the PPR codes, determine nucleotide specificity. Numerous PPR codes are distributed among the vast number of PPR motifs, but the correlation between PPR codes and RNA bases is poorly understood, which hinders target RNA prediction and functional investigation of PPR proteins. To address this issue, we developed a modular assembly method for high-throughput construction of designer PPRs, and by using this method, 62 designer PPR proteins containing various PPR codes were assembled. Then, the correlation between these PPR codes and RNA bases was systematically explored and delineated. Based on this correlation, the web server PPRCODE (http://yinlab.hzau.edu.cn/pprcode) was developed. Our study will not only serve as a platform for facilitating target RNA prediction and functional investigation of the large number of PPR family proteins but also provide an alternative strategy for the assembly of custom PPRs that can potentially be used for plant organelle RNA manipulation.
Asunto(s)
Proteínas de Arabidopsis/genética , Motivos de Nucleótidos/genética , Proteínas de Unión al ARN/genética , ARN/genética , Secuencia de Aminoácidos/genética , Arabidopsis/genética , Modelos Moleculares , Orgánulos/genéticaRESUMEN
Since the outbreak of novel coronavirus infection pneumonia in Wuhan City, China, in late 2019, such cases have been gradually reported in other parts of China and abroad. Children have become susceptible to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) because of their immature immune function. As the outbreak has progressed, more cases of novel coronavirus infection/pneumonia in children have been reported. Compared with adults, the impact of SARS-CoV-2 infection in children is less severe, with a lower incidence and susceptibility in children, which results in fewer children being tested, thereby underestimating the actual number of infections. Therefore, strengthening the diagnosis of the disease is particularly important for children, and early and clear diagnosis can determine treatment strategies and reduce the harm caused by the disease to children. According to the Novel Coronavirus Infection Pneumonia Diagnosis and Treatment Standards (trial version 7) issued by National Health Committee and the latest diagnosis and treatment strategies for novel coronavirus infection pneumonia in children, this review summarizes current strategies on diagnosis and treatment of SARS-CoV-2 infection in children.
Asunto(s)
Antivirales/uso terapéutico , Betacoronavirus/genética , Técnicas de Laboratorio Clínico/métodos , Infecciones por Coronavirus/diagnóstico , Infecciones por Coronavirus/terapia , Neumonía Viral/diagnóstico , Neumonía Viral/terapia , ARN Viral/sangre , Adenosina Monofosfato/análogos & derivados , Adenosina Monofosfato/uso terapéutico , Alanina/análogos & derivados , Alanina/uso terapéutico , Enfermedades Asintomáticas , Betacoronavirus/patogenicidad , Biomarcadores/sangre , COVID-19 , Prueba de COVID-19 , Niño , Infecciones por Coronavirus/transmisión , Infecciones por Coronavirus/virología , Tos/diagnóstico , Combinación de Medicamentos , Diagnóstico Precoz , Fiebre/diagnóstico , Humanos , Hidroxicloroquina/uso terapéutico , Interferón-alfa/uso terapéutico , Lopinavir/uso terapéutico , Pandemias , Neumonía Viral/transmisión , Neumonía Viral/virología , Guías de Práctica Clínica como Asunto , ARN Viral/genética , Ribavirina/uso terapéutico , Ritonavir/uso terapéutico , SARS-CoV-2 , Índice de Severidad de la Enfermedad , Tomografía Computarizada por Rayos XRESUMEN
Cytochrome P450s (P450s) are the most versatile catalysts utilized by plants to produce structurally and functionally diverse metabolites. Given the high degree of gene redundancy and challenge to functionally characterize plant P450s, protein engineering is used as a complementary strategy to study the mechanisms of P450-mediated reactions, or to alter their functions. We previously proposed an approach of engineering plant P450s based on combining high-accuracy homology models generated by Rosetta combined with data-driven design using evolutionary information of these enzymes. With this strategy, we repurposed a multi-functional P450 (CYP87D20) into a monooxygenase after redesigning its active site. Since most plant P450s are membrane-anchored proteins that are adapted to the micro-environments of plant cells, expressing them in heterologous hosts usually results in problems of expression or activity. Here, we applied computational design to tackle these issues by simultaneous optimization of the protein surface and active site. After screening 17 variants, effective substitutions of surface residues were observed to improve both expression and activity of CYP87D20. In addition, the identified substitutions were additive and by combining them a highly efficient C11 hydroxylase of cucurbitadienol was created to participate in the mogrol biosynthesis. This study shows the importance of considering the interplay between surface and active site residues for P450 engineering. Our integrated strategy also opens an avenue to create more tailoring enzymes with desired functions for the metabolic engineering of high-valued compounds like mogrol, the precursor of natural sweetener mogrosides. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-021-00056-z.