Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 483.925
Filtrar
1.
BMC Microbiol ; 24(1): 125, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622505

RESUMO

γ- poly glutamic acid (γ-PGA), a high molecular weight polymer, is synthesized by microorganisms and secreted into the extracellular space. Due to its excellent performance, γ-PGA has been widely used in various fields, including food, biomedical and environmental fields. In this study, we screened natto samples for two strains of Bacillus subtilis N3378-2at and N3378-3At that produce γ-PGA. We then identified the γ-PGA synthetase gene cluster (PgsB, PgsC, PgsA, YwtC and PgdS), glutamate racemase RacE, phage-derived γ-PGA hydrolase (PghB and PghC) and exo-γ-glutamyl peptidase (GGT) from the genome of these strains. Based on these γ-PGA-related protein sequences from isolated Bacillus subtilis and 181 B. subtilis obtained from GenBank, we carried out genotyping analysis and classified them into types 1-5. Since we found B. amyloliquefaciens LL3 can produce γ-PGA, we obtained the B. velezensis and B. amyloliquefaciens strains from GenBank and classified them into types 6 and 7 based on LL3. Finally, we constructed evolutionary trees for these protein sequences. This study analyzed the distribution of γ-PGA-related protein sequences in the genomes of B. subtilis, B. velezensis and B. amyloliquefaciens strains, then the evolutionary diversity of these protein sequences was analyzed, which provided novel information for the development and utilization of γ-PGA-producing strains.


Assuntos
Bacillus subtilis , Ácido Glutâmico , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Ácido Glutâmico/metabolismo , Sequência de Aminoácidos , Hidrolases/metabolismo , Ácido Poliglutâmico/genética , Genômica
2.
Plant Mol Biol ; 114(3): 43, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630371

RESUMO

The GATA transcription factors (TFs) have been extensively studied for its regulatory role in various biological processes in many plant species. The functional and molecular mechanism of GATA TFs in regulating tolerance to abiotic stress has not yet been studied in the common bean. This study analyzed the functional identity of the GATA gene family in the P. vulgaris genome under different abiotic and phytohormonal stress. The GATA gene family was systematically investigated in the P. vulgaris genome, and 31 PvGATA TFs were identified. The study found that 18 out of 31 PvGATA genes had undergone duplication events, emphasizing the role of gene duplication in GATA gene expansion. All the PvGATA genes were classified into four significant subfamilies, with 8, 3, 6, and 13 members in each subfamily (subfamilies I, II, III, and IV), respectively. All PvGATA protein sequences contained a single GATA domain, but subfamily II members had additional domains such as CCT and tify. A total of 799 promoter cis-regulatory elements (CREs) were predicted in the PvGATAs. Additionally, we used qRT-PCR to investigate the expression profiles of five PvGATA genes in the common bean roots under abiotic conditions. The results suggest that PvGATA01/10/25/28 may play crucial roles in regulating plant resistance against salt and drought stress and may be involved in phytohormone-mediated stress signaling pathways. PvGATA28 was selected for overexpression and cloned into N. benthamiana using Agrobacterium-mediated transformation. Transgenic lines were subjected to abiotic stress, and results showed a significant tolerance of transgenic lines to stress conditions compared to wild-type counterparts. The seed germination assay suggested an extended dormancy of transgenic lines compared to wild-type lines. This study provides a comprehensive analysis of the PvGATA gene family, which can serve as a foundation for future research on the function of GATA TFs in abiotic stress tolerance in common bean plants.


Assuntos
Phaseolus , Phaseolus/genética , Fatores de Transcrição GATA/genética , Agrobacterium , Sequência de Aminoácidos , Secas , Reguladores de Crescimento de Plantas
3.
J Agric Food Chem ; 72(15): 8491-8505, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38587859

RESUMO

Aging and stress have contributed to the development of memory disorders. Phe-Pro-Phe (FPF) was identified with high stability by mass spectrometry from simulated gastrointestinal digestion and everted gut sac products of the Antarctic krill peptide Ser-Ser-Asp-Ala-Phe-Phe-Pro-Phe-Arg (SSDAFFPFR) which was found to have a positive impact on memory enhancement. This study investigated the digestive stability, absorption, and memory-enhancing effects of FPF using nuclear magnetic resonance spectroscopy, simulated gastrointestinal digestion, in vivo fluorescence distribution analysis, mouse behavioral experiments, acetylcholine function, Nissl staining, immunofluorescence, and immunohistochemistry. FPF crossed the blood-brain barrier into the brain after digestion, significantly reduced shock time, working memory errors, and reference memory errors, and increased the recognition index. Additionally, FPF elevated ACh content; Nissl body counts; and CREB, SYN, and PSD-95 expression levels, while reducing AChE activity (P < 0.05). This implies that FPF prevents scopolamine-induced memory impairment and provides a basis for future research on memory disorders.


Assuntos
Euphausiacea , Animais , Camundongos , Sequência de Aminoácidos , Peptídeos/química , Acetilcolina , Transtornos da Memória
4.
Sci Rep ; 14(1): 8695, 2024 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622194

RESUMO

AMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily. While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging. Computational prediction techniques provide a faster alternative approach. The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences. In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction. We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework. We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework. Tenfold cross-validation is used to evaluate the model's capability to distinguish between AMPylated and non-AMPylated sites. The top-performing set of features extracted achieved MCC score of 0.58, Accuracy of 0.8, AUC-ROC of 0.85 and F1 score of 0.73. Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.


Assuntos
Processamento de Proteína Pós-Traducional , Tirosina , Tirosina/metabolismo , Sequência de Aminoácidos , Monofosfato de Adenosina/metabolismo , Treonina/metabolismo
5.
Nat Commun ; 15(1): 3047, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589369

RESUMO

Clustering biological sequences into similar groups is an increasingly important task as the number of available sequences continues to grow exponentially. Search-based approaches to clustering scale super-linearly with the number of input sequences, making it impractical to cluster very large sets of sequences. Approaches to clustering sequences in linear time currently lack the accuracy of super-linear approaches. Here, I set out to develop and characterize a strategy for clustering with linear time complexity that retains the accuracy of less scalable approaches. The resulting algorithm, named Clusterize, sorts sequences by relatedness to linearize the clustering problem. Clusterize produces clusters with accuracy rivaling popular programs (CD-HIT, MMseqs2, and UCLUST) but exhibits linear asymptotic scalability. Clusterize generates higher accuracy and oftentimes much larger clusters than Linclust, a fast linear time clustering algorithm. I demonstrate the utility of Clusterize by accurately solving different clustering problems involving millions of nucleotide or protein sequences.


Assuntos
Algoritmos , Sequência de Aminoácidos , Análise por Conglomerados
6.
Microb Biotechnol ; 17(4): e14404, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38588312

RESUMO

Acid phosphatases are enzymes that play a crucial role in the hydrolysis of various organophosphorous molecules. A putative acid phosphatase called FS6 was identified using genetic profiles and sequences from different environments. FS6 showed high sequence similarity to type C acid phosphatases and retained more than 30% of consensus residues in its protein sequence. A histidine-tagged recombinant FS6 produced in Escherichia coli exhibited extremophile properties, functioning effectively in a broad pH range between 3.5 and 8.5. The enzyme demonstrated optimal activity at temperatures between 25 and 50°C, with a melting temperature of 51.6°C. Kinetic parameters were determined using various substrates, and the reaction catalysed by FS6 with physiological substrates was at least 100-fold more efficient than with p-nitrophenyl phosphate. Furthermore, FS6 was found to be a decamer in solution, unlike the dimeric forms of crystallized proteins in its family.


Assuntos
Fosfatase Ácida , Extremófilos , Fosfatase Ácida/metabolismo , Extremófilos/genética , Extremófilos/metabolismo , Hidrólise , Sequência de Aminoácidos , Especificidade por Substrato , Concentração de Íons de Hidrogênio
7.
BMC Bioinformatics ; 25(1): 145, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580921

RESUMO

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .


Assuntos
Proteínas , Software , Sequência de Aminoácidos , Matrizes de Pontuação de Posição Específica , Evolução Biológica , Biologia Computacional/métodos
8.
Int J Mol Sci ; 25(7)2024 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-38612733

RESUMO

In the human genome, two short open reading frames (ORFs) separated by a transcriptional silencer and a small intervening sequence stem from the gene SMIM45. The two ORFs show different translational characteristics, and they also show divergent patterns of evolutionary development. The studies presented here describe the evolution of the components of SMIM45. One ORF consists of an ultra-conserved 68 amino acid (aa) sequence, whose origins can be traced beyond the evolutionary age of divergence of the elephant shark, ~462 MYA. The silencer also has ancient origins, but it has a complex and divergent pattern of evolutionary formation, as it overlaps both at the 68 aa ORF and the intervening sequence. The other ORF consists of 107 aa. It develops during primate evolution but is found to originate de novo from an ancestral non-coding genomic region with root origins within the Afrothere clade of placental mammals, whose evolutionary age of divergence is ~99 MYA. The formation of the complete 107 aa ORF during primate evolution is outlined, whereby sequence development is found to occur through biased mutations, with disruptive random mutations that also occur but lead to a dead-end. The 107 aa ORF is of particular significance, as there is evidence to suggest it is a protein that may function in human brain development. Its evolutionary formation presents a view of a human-specific ORF and its linked silencer that were predetermined in non-primate ancestral species. The genomic position of the silencer offers interesting possibilities for the regulation of transcription of the 107 aa ORF. A hypothesis is presented with respect to possible spatiotemporal expression of the 107 aa ORF in embryonic tissues.


Assuntos
Genoma Humano , Placenta , Feminino , Gravidez , Animais , Humanos , Fases de Leitura Aberta/genética , Sequência de Aminoácidos , Primatas , Mamíferos
9.
J Pineal Res ; 76(3): e12955, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38606787

RESUMO

Identifying the target cells of a hormone is a key step in understanding its function. Once the molecular nature of the receptors for a hormone has been established, researchers can use several techniques to detect these receptors. Here I will review the different tools used over the years to localize melatonin receptors and the problems associated with each of these techniques. The radioligand 2-[125I] iodomelatonin was the first tool to allow localization of melatonin receptors on tissue sections. Once the MT1 and MT2 receptors were cloned, in situ hybridization could be used to detect the messenger RNA for these receptors. The deduced amino acid sequences for MT1 and MT2 receptors allowed the production of peptide immunogens to generate antibodies against the MT1 and MT2 receptors. Finally, transgenic reporters driven by the promoter elements of the MT1 and MT2 genes have been used to map the expression of MT1 and MT2 in the brain and the retina. Several issues have complicated the localization of melatonin receptors and the characterization of melatonin target cells over the last three decades. Melatonin receptors are expressed at low levels, leading to sensitivity issues for their detection. The second problem are specificity issues with antibodies directed against the MT1 and MT2 melatonin receptors. These receptors are G protein-coupled receptors and many antibodies directed against such receptors have been shown to present similar problems concerning their specificity. Despite these specificity problems which start to be seriously addressed by recent studies, antibodies will be important tools in the future to identify and phenotype melatonin target cells. However, we will have to be more stringent than previously when establishing their specificity. The results obtained by these antibodies will have to be confronted and be coherent with results obtained by other techniques.


Assuntos
Melatonina , Receptor MT2 de Melatonina , Receptores de Melatonina/metabolismo , Receptor MT2 de Melatonina/genética , Receptor MT2 de Melatonina/metabolismo , Melatonina/metabolismo , Receptor MT1 de Melatonina/genética , Receptor MT1 de Melatonina/metabolismo , Encéfalo/metabolismo , Sequência de Aminoácidos
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38609330

RESUMO

Understanding the protein structures is invaluable in various biomedical applications, such as vaccine development. Protein structure model building from experimental electron density maps is a time-consuming and labor-intensive task. To address the challenge, machine learning approaches have been proposed to automate this process. Currently, the majority of the experimental maps in the database lack atomic resolution features, making it challenging for machine learning-based methods to precisely determine protein structures from cryogenic electron microscopy density maps. On the other hand, protein structure prediction methods, such as AlphaFold2, leverage evolutionary information from protein sequences and have recently achieved groundbreaking accuracy. However, these methods often require manual refinement, which is labor intensive and time consuming. In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure. Our method was evaluated on 39 multi-domain proteins and we improved the average residue coverage from 78.2 to 90.0% and average local Distance Difference Test score from 0.67 to 0.71. We also compared DeepTracer-Refine with Phenixs AlphaFold refinement and demonstrated that our method not only performs better when the initial AlphaFold model is less precise but also surpasses Phenix in run-time performance.


Assuntos
Evolução Biológica , Aprendizado de Máquina , Microscopia Crioeletrônica , Sequência de Aminoácidos , Bases de Dados Factuais
11.
Mol Med ; 30(1): 48, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38594612

RESUMO

BACKGROUND: Immune-mediated arthritis is a group of autoinflammatory diseases, where the patient's own immune system attacks and destroys synovial joints. Sustained remission is not always achieved with available immunosuppressive treatments, warranting more detailed studies of T cell responses that perpetuate synovial inflammation in treatment-refractory patients. METHODS: In this study, we investigated CD4 + and CD8 + T lymphocytes from the synovial tissue and peripheral blood of patients with treatment-resistant immune-mediated arthritis using paired single-cell RNA and TCR-sequencing. To gain insights into the trafficking of clonal families, we compared the phenotypes of clones with the exact same TCRß amino acid sequence between the two tissues. RESULTS: Our results show that both CD4 + and CD8 + T cells display a more activated and inflamed phenotype in the synovial tissue compared to peripheral blood both at the population level and within individual T cell families. Furthermore, we found that both cell subtypes exhibited clonal expansion in the synovial tissue. CONCLUSIONS: Our findings suggest that the local environment in the synovium drives the proliferation of activated cytotoxic T cells, and both CD4 + and CD8 + T cells may contribute to tissue destruction and disease pathogenesis.


Assuntos
Artrite , Linfócitos T CD8-Positivos , Humanos , Linfócitos T CD8-Positivos/metabolismo , Artrite/metabolismo , Artrite/patologia , Membrana Sinovial , Células Clonais , Sequência de Aminoácidos , Linfócitos T CD4-Positivos/metabolismo
12.
Biochem Biophys Res Commun ; 709: 149839, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38564943

RESUMO

Single-domain VHH antibody is regarded as one of the promising antibody classes for therapeutic and diagnostic applications. VHH antibodies have amino acids in framework region 2 that are distinct from those in conventional antibodies, such as the Val37Phe/Tyr (V37F/Y) substitution. Correlations between the residue type at position 37 and the conformation of the CDR3 in VHH antigen recognition have been previously reported. However, few studies focused on the meaning of harboring two residue types in position 37 of VHH antibodies, and the concrete roles of Y37 have been little to be elucidated. Here, we investigated the functional states of position 37 in co-crystal structures and performed analyses of three model antibodies with either F or Y at position 37. Our analysis indicates that Y at position 37 enhances the dissociation rate, which is highly correlated with drug efficacy. Our findings help to explain the molecular mechanisms that distinguish VHH antibodies from conventional antibodies.


Assuntos
Antígenos de Grupos Sanguíneos , Camelídeos Americanos , Anticorpos de Domínio Único , Animais , Anticorpos de Domínio Único/química , Sequência de Aminoácidos , Anticorpos
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557677

RESUMO

Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.


Assuntos
Engenharia de Proteínas , Proteínas , Proteínas/química , Sequência de Aminoácidos , Engenharia de Proteínas/métodos
14.
Sci Rep ; 14(1): 7736, 2024 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565583

RESUMO

Evolution shapes protein sequences for their functions. Here, we studied the moonlighting functions of the N-linked sequon NXS/T, where X is not P, in human nucleocytosolic proteins. By comparing membrane and secreted proteins in which sequons are well known for N-glycosylation, we discovered that cyto-sequons can participate in nucleic acid binding, particularly in zinc finger proteins. Our global studies further discovered that sequon occurrence is largely proportional to protein length. The contribution of sequons to protein functions, including both N-glycosylation and nucleic acid binding, can be regulated through their density as well as the biased usage between NXS and NXT. In proteins where other PTMs or structural features are rich, such as phosphorylation, transmembrane ɑ-helices, and disulfide bridges, sequon occurrence is scarce. The information acquired here should help understand the relationship between protein sequence and function and assist future protein design and engineering.


Assuntos
Ácidos Nucleicos , Proteínas , Humanos , Proteínas/metabolismo , Glicosilação , Sequência de Aminoácidos , Fosforilação , Ácidos Nucleicos/metabolismo
15.
BMC Bioinformatics ; 25(1): 141, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38566002

RESUMO

Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios. To address these challenges, we propose CAT-DTI, a model based on cross-attention and Transformer, possessing domain adaptation capability. CAT-DTI effectively captures the drug-target interactions while adapting to out-of-distribution data. Specifically, we use a convolution neural network combined with a Transformer to encode the distance relationship between amino acids within protein sequences and employ a cross-attention module to capture the drug-target interaction features. Generalization to new DTI prediction scenarios is achieved by leveraging a conditional domain adversarial network, aligning DTI representations under diverse distributions. Experimental results within in-domain and cross-domain scenarios demonstrate that CAT-DTI model overall improves DTI prediction performance compared with previous methods.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Interações Medicamentosas , Sequência de Aminoácidos , Aminoácidos
16.
BMC Bioinformatics ; 25(1): 146, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38600441

RESUMO

BACKGROUND: The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights. RESULTS: In this work, we propose a novel pipeline for KEGG orthology annotation of bacterial protein sequences that uses natural language processing and deep learning. To assess the effectiveness of our pipeline, we conducted evaluations using the genomes of two randomly selected species from the KEGG database. In our evaluation, we obtain competitive results on precision, recall, and F1 score, with values of 0.948, 0.947, and 0.947, respectively. CONCLUSIONS: Our experimental results suggest that our pipeline demonstrates performance comparable to traditional methods and excels in identifying distant relatives with low sequence identity. This demonstrates the potential of our pipeline to significantly improve the accuracy and comprehensiveness of KEGG orthology annotation, thereby advancing our understanding of functional relationships within biological systems.


Assuntos
Proteínas de Bactérias , Processamento de Linguagem Natural , Genoma , Anotação de Sequência Molecular , Sequência de Aminoácidos
17.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600663

RESUMO

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.


Assuntos
Redes Neurais de Computação , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química , Análise de Sequência de Proteína/métodos
18.
Sci Rep ; 14(1): 8136, 2024 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-38584172

RESUMO

Computational approaches for predicting the pathogenicity of genetic variants have advanced in recent years. These methods enable researchers to determine the possible clinical impact of rare and novel variants. Historically these prediction methods used hand-crafted features based on structural, evolutionary, or physiochemical properties of the variant. In this study we propose a novel framework that leverages the power of pre-trained protein language models to predict variant pathogenicity. We show that our approach VariPred (Variant impact Predictor) outperforms current state-of-the-art methods by using an end-to-end model that only requires the protein sequence as input. Using one of the best-performing protein language models (ESM-1b), we establish a robust classifier that requires no calculation of structural features or multiple sequence alignments. We compare the performance of VariPred with other representative models including 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM and ESM variant. VariPred performs as well as, or in most cases better than these other predictors using six variant impact prediction benchmarks despite requiring only sequence data and no pre-processing of the data.


Assuntos
Mutação de Sentido Incorreto , Proteínas , Virulência , Proteínas/genética , Sequência de Aminoácidos , Biologia Computacional/métodos
19.
Int J Biol Macromol ; 265(Pt 1): 130854, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38484814

RESUMO

Monocarboxylate transporter-1 (MCT-1) inhibitors were screened from the Fv-antibody library, which contained complementary determining region 3 with randomized amino acid sequences (11 residues) through site-directed mutagenesis. Fv-antibodies against MCT-1 were screened from the autodisplayed Fv-antibody library. Two clones were screened, and the binding affinity (KD) against MCT-1 was estimated using flow cytometry. The screened Fv-antibodies were expressed as soluble fusion proteins (Fv-1 and Fv-2) and the KD for MCT-1 was estimated using the SPR biosensor. The inhibitory activity of the expressed Fv-antibodies was observed in HEK293T and Jurkat cell lines by measuring intracellular pH and lactate accumulation. The level of cell viability in HEK293T and Jurkat cell lines was decreased by the inhibitory activity of the expressed Fv-antibodies. The binding properties of the Fv-antibodies to MCT-1 were analyzed using molecular docking simulations. Overall, the results showed that the screened Fv-antibodies against MCT-1 from the Fv-antibody library had high binding affinity and inhibitory activity against MCT-1, which could be used as potential therapeutic drug candidates for the MCT-1 inhibitor.


Assuntos
Anticorpos , Proteínas de Transporte , Humanos , Simulação de Acoplamento Molecular , Células HEK293 , Sequência de Aminoácidos , Biblioteca Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...