Pesquisa | Secretaria de Estado da Saúde

Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing.

He, Yan; Zhou, Xibin; Chang, Chong; Chen, Ge; Liu, Weikuan; Li, Geng; Fan, Xiaoqi; Sun, Mingsun; Miao, Chensi; Huang, Qianyue; Ma, Yunqing; Yuan, Fajie; Chang, Xing.

Mol Cell ; 84(7): 1257-1270.e6, 2024 Apr 04.

Artigo em Inglês | MEDLINE | ID: mdl-38377993

RESUMO

Current base editors (BEs) use DNA deaminases, including cytidine deaminase in cytidine BE (CBE) or adenine deaminase in adenine BE (ABE), to facilitate transition nucleotide substitutions. Combining CBE or ABE with glycosylase enzymes can induce limited transversion mutations. Nonetheless, a critical demand remains for BEs capable of generating alternative mutation types, such as T>G corrections. In this study, we leveraged pre-trained protein language models to optimize a uracil-N-glycosylase (UNG) variant with altered specificity for thymines (eTDG). Notably, after two rounds of testing fewer than 50 top-ranking variants, more than 50% exhibited over 1.5-fold enhancement in enzymatic activities. When eTDG was fused with nCas9, it induced programmable T-to-S (G/C) substitutions and corrected db/db diabetic mutation in mice (up to 55%). Our findings not only establish orthogonal strategies for developing novel BEs but also demonstrate the capacities of protein language models for optimizing enzymes without extensive task-specific training data.

Assuntos

Ácidos Alcanossulfônicos , Edição de Genes , Uracila-DNA Glicosidase , Animais , Camundongos , Mutação , Uracila-DNA Glicosidase/genética , Uracila-DNA Glicosidase/metabolismo

Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework.

Zhang, Guoqing; Wang, Hui; Zhang, Zhiguo; Zhang, Lu; Guo, Guibing; Yang, Jian; Yuan, Fajie; Ju, Feng.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-39007592

RESUMO

High-throughput DNA sequencing technologies decode tremendous amounts of microbial protein-coding gene sequences. However, accurately assigning protein functions to novel gene sequences remain a challenge. To this end, we developed FunGeneTyper, an extensible framework with two new deep learning models (i.e., FunTrans and FunRep), structured databases, and supporting resources for achieving highly accurate (Accuracy > 0.99, F1-score > 0.97) and fine-grained classification of antibiotic resistance genes (ARGs) and virulence factor genes. Using an experimentally confirmed dataset of ARGs comprising remote homologous sequences as the test set, our framework achieves by-far-the-best performance in the discovery of new ARGs from human gut (F1-score: 0.6948), wastewater (0.6072), and soil (0.5445) microbiomes, beating the state-of-the-art bioinformatics tools and sequence alignment-based (F1-score: 0.0556-0.5065) and domain-based (F1-score: 0.2630-0.5224) annotation approaches. Furthermore, our framework is implemented as a lightweight, privacy-preserving, and plug-and-play neural network module, facilitating its versatility and accessibility to developers and users worldwide. We anticipate widespread utilization of FunGeneTyper (https://github.com/emblab-westlake/FunGeneTyper) for precise classification of protein-coding gene functions and the discovery of numerous valuable enzymes. This advancement will have a significant impact on various fields, including microbiome research, biotechnology, metagenomics, and bioinformatics.

Assuntos

Aprendizado Profundo , Humanos , Biologia Computacional/métodos , Microbiota/genética , Proteínas de Bactérias/genética , Resistência Microbiana a Medicamentos/genética , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Fatores de Virulência/genética

Protocol to use protein language models predicting and following experimental validation of function-enhancing variants of thymine-N-glycosylase.

He, Yan; Zhou, Xibin; Yuan, Fajie; Chang, Xing.

STAR Protoc ; 5(3): 103188, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-39002134

RESUMO

Protein language models (PLMs) are machine learning tools trained to predict masked amino acids within protein sequences, offering opportunities to enhance protein function without prior knowledge of their specific roles. Here, we present a protocol for optimizing thymine-DNA-glycosylase (TDG) using PLMs. We describe steps for "zero-shot" enzyme optimization, construction of plasmids, double plasmid transfection, and high-throughput sequencing and data analysis. This protocol holds promise for streamlining the engineering of gene editing tools, delivering improved activity while minimizing the experimental workload. For complete details on the use and execution of this protocol, please refer to He et al.1.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa