Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 334
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-39152734

RESUMO

INTRODUCTION: Metaproteomics offers insights into the function of complex microbial communities while it is also capable of revealing microbe-microbe and host-microbe interactions. Data-independent acquisition (DIA) mass spectrometry is an emerging technology, which holds great potential to achieve deep and accurate metaproteomics with higher reproducibility yet still facing a series of challenges due to the inherent complexity of metaproteomics and DIA data. AREAS COVERED: This review offers an overview of the DIA metaproteomics approaches, covering aspects such as database construction, search strategy, and data analysis tools. Several cases of current DIA metaproteomics studies are presented to illustrate the procedures. Important ongoing challenges are also highlighted. Future perspectives of DIA methods for metaproteomics analysis are further discussed. Cited references are searched through and collected from Google Scholar and PubMed. EXPERT OPINION: Considering the inherent complexity of DIA metaproteomics data, data analysis strategies specifically designed for interpretation is imperative. From this point of view, we anticipate that deep learning methods and de novo sequencing methods will become more prevalent in the future, potentially improving protein coverage in metaproteomics. Moreover, the advancement of metaproteomics also depends on the development of sample preparation methods, data analysis strategies, etc. These factors are key to unlocking the full potential of metaproteomics.

2.
J Mol Biol ; : 168741, 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39122168

RESUMO

The purpose of feature selection in protein sequence recognition problems is to select the optimal feature set and use it as training input for classifiers and discover key sequence features of specific proteins. In the feature selection process, relevant features associated with the target task will be retained, and irrelevant and redundant features will be removed. Therefore, in an ideal state, a feature combination with smaller feature dimensions and higher performance indicators is desired. This paper proposes an algorithm called IIFS2.0 based on the cache elimination strategy, which takes the local optimal combination of cached feature subsets as a breakthrough point. It searches for a new feature combination method through the cache elimination strategy to avoid the drawbacks of human factors and excessive reliance on feature sorting results. We validated and analyzed its effectiveness on the protein dataset, demonstrating that IIFS2.0 significantly reduces the dimensionality of feature combinations while also improving various evaluation indicators. In addition, we provide IIFS2.0 on http://112.124.26.17:8006/ for researchers to use.

3.
Yi Chuan ; 46(8): 661-669, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39140146

RESUMO

The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.


Assuntos
Biologia Computacional , Biologia Computacional/métodos , Sequência de Aminoácidos , Proteínas/química , Algoritmos , Análise de Sequência de Proteína/métodos , Enzimas/química , Enzimas/metabolismo
4.
Bioresour Bioprocess ; 11(1): 69, 2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39014092

RESUMO

Gelatin is a product obtained through partial hydrolysis and thermal denaturation of collagen, belonging to natural biopeptides. With irreplaceable biological functions in the field of biomedical science and tissue engineering, it has been widely applied. The amino acid sequence of recombinant human-like gelatin was constructed through a newly designed hexamer composed of six protein monomer sequences in series, with the minimum repeating unit being the characteristic Gly-X-Y sequence found in type III human collagen α1 chain. The nucleotide sequence was subsequently inserted into the genome of Pichia pastoris to enable soluble secretion expression of recombinant gelatin. At the shake flask fermentation level, the yield of recombinant gelatin is up to 0.057 g/L, and its purity can rise up to 95% through affinity purification. It was confirmed in the molecular weight determination and amino acid analysis that the amino acid composition of the obtained recombinant gelatin is identical to that of the theoretically designed. Furthermore, scanning electron microscopy revealed that the freeze-dried recombinant gelatin hydrogel exhibited a porous structure. After culturing cells continuously within these gelatin microspheres for two days followed by fluorescence staining and observation through confocal laser scanning microscopy, it was observed that cells clustered together within the gelatin matrix, exhibiting three-dimensional growth characteristics while maintaining good viability. This research presents promising prospects for developing recombinant gelatin as a biomedical material.

5.
Comput Struct Biotechnol J ; 23: 2637-2647, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39021584

RESUMO

Molecular phylogenetic research has relied on the analysis of the coding sequences by genes or of the amino acid sequences by the encoded proteins. Enumerating the numbers of mismatches, being indicators of mutation, has been central to pertinent algorithms. Specific amino acids possess quantifiable characteristics that enable the conversion from "words" (strings of letters denoting amino acids or bases) to "waves" (strings of quantitative values representing the physico-chemical properties) or to matrices (coordinates representing the positions in a comprehensive property space). The application of such numerical representations to evolutionary analysis takes into account not only the occurrence of mutations but also their properties as influences that drive speciation, because selective pressures favor certain mutations over others, and this predilection is represented in the characteristics of the incorporated amino acids (it is not born out solely by the mismatches). Besides being more discriminating sources for tree-generating algorithms than match/mismatch, the number strings can be examined for overall similarity with average mutual information, autocorrelation, and fractal dimension. Bivariate wavelet analysis aids in distinguishing hypermutable versus conserved domains of the protein. The matrix depiction is readily subjected to comparisons of distances, and it allows the generation of heat maps or graphs. This analysis preserves the accepted taxa order where tree construction with standard approaches yields conflicting results (for the protein S100A6). It also aids hypothesis generation about the origin of mitochondrial proteins. These analytical algorithms have been automated in R and are applicable to various processes that are describable in matrix format.

6.
Comput Struct Biotechnol J ; 23: 2648-2660, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39027650

RESUMO

Therapeutic antibodies are an important class of biopharmaceuticals. With the rapid development of deep learning methods and the increasing amount of antibody data, antibody generative models have made great progress recently. They aim to solve the antibody space searching problems and are widely incorporated into the antibody development process. Therefore, a comprehensive introduction to the development methods in this field is imperative. Here, we collected 34 representative antibody generative models published recently and all generative models can be divided into three categories: sequence-generating models, structure-generating models, and hybrid models, based on their principles and algorithms. We further studied their performance and contributions to antibody sequence prediction, structure optimization, and affinity enhancement. Our manuscript will provide a comprehensive overview of the status of antibody generative models and also offer guidance for selecting different approaches.

7.
Int J Biol Macromol ; 277(Pt 1): 134147, 2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39059541

RESUMO

Heat shock proteins (HSPs) from different families and sub-types play a vital role in the folding and unfolding of proteins, in maintaining cellular health, and in preventing serious disorders. Previous computational methods for HSP classification have yielded promising performance. However, most of the existing methods rely heavily on amino acid composition features and still face challenges related to interpretability and accuracy. To overcome these issues, we introduce a novel frequent sequential pattern (FSP)-based analysis and classification method for the classification of HSPs, their families, and sub-types. The proposed method is called FSP4HSP, which stands for "FSP for HSP". It identifies FSPs of amino acids (FSPAAs) and utilizes them for analysis and classification. Besides FSPAAs, sequential rules among amino acids are also discovered. Both binary and multi-class classification scenarios are considered, with the utilization of eight integer-based and four string-based classifiers. The incorporation of FSPAAs in the classification/prediction task enhances the interpretability of FSP4HSP and a comprehensive performance comparison using various evaluation measures demonstrates that it surpasses existing methods for the classification/recognition of HSPs.

8.
Methods Mol Biol ; 2836: 253-281, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38995545

RESUMO

Interactomics is bringing a deluge of data regarding protein-protein interactions (PPIs) which are involved in various molecular processes in all types of cells. However, this information does not easily translate into direct and precise molecular interfaces. This limits our understanding of each interaction network and prevents their efficient modulation. A lot of the detected interactions involve recognition of short linear motifs (SLiMs) by a folded domain while others rely on domain-domain interactions. Functional SLiMs hide among a lot of spurious ones, making deeper analysis of interactomes tedious. Hence, actual contacts and direct interactions are difficult to identify.Consequently, there is a need for user-friendly bioinformatic tools, enabling rapid molecular and structural analysis of SLiM-based PPIs in a protein network. In this chapter, we describe the use of the new webserver SLiMAn to help digging into SLiM-based PPIs in an interactive fashion.


Assuntos
Biologia Computacional , Internet , Mapeamento de Interação de Proteínas , Software , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/metabolismo , Mapas de Interação de Proteínas , Motivos de Aminoácidos , Humanos , Bases de Dados de Proteínas , Ligação Proteica
9.
Comput Struct Biotechnol J ; 23: 2779-2797, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-39050782

RESUMO

Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.

10.
Proteomics ; : e2400044, 2024 Jun 02.
Artigo em Francês | MEDLINE | ID: mdl-38824664

RESUMO

RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.

11.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38851299

RESUMO

Protein-protein interactions (PPIs) are the basis of many important biological processes, with protein complexes being the key forms implementing these interactions. Understanding protein complexes and their functions is critical for elucidating mechanisms of life processes, disease diagnosis and treatment and drug development. However, experimental methods for identifying protein complexes have many limitations. Therefore, it is necessary to use computational methods to predict protein complexes. Protein sequences can indicate the structure and biological functions of proteins, while also determining their binding abilities with other proteins, influencing the formation of protein complexes. Integrating these characteristics to predict protein complexes is very promising, but currently there is no effective framework that can utilize both protein sequence and PPI network topology for complex prediction. To address this challenge, we have developed HyperGraphComplex, a method based on hypergraph variational autoencoder that can capture expressive features from protein sequences without feature engineering, while also considering topological properties in PPI networks, to predict protein complexes. Experiment results demonstrated that HyperGraphComplex achieves satisfactory predictive performance when compared with state-of-art methods. Further bioinformatics analysis shows that the predicted protein complexes have similar attributes to known ones. Moreover, case studies corroborated the remarkable predictive capability of our model in identifying protein complexes, including 3 that were not only experimentally validated by recent studies but also exhibited high-confidence structural predictions from AlphaFold-Multimer. We believe that the HyperGraphComplex algorithm and our provided proteome-wide high-confidence protein complex prediction dataset will help elucidate how proteins regulate cellular processes in the form of complexes, and facilitate disease diagnosis and treatment and drug development. Source codes are available at https://github.com/LiDlab/HyperGraphComplex.


Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Proteínas/química , Algoritmos , Mapas de Interação de Proteínas , Bases de Dados de Proteínas , Humanos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos
12.
Bioengineering (Basel) ; 11(6)2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38927853

RESUMO

The significant growth of the global protein drug market, including fusion proteins, emphasizes the crucial role of optimizing amino acid sequences to enhance the productivity and bioefficacy. Among these fusion proteins, RBP-IIIA-IB, comprising retinol-binding protein in conjunction with the albumin domains, IIIA and IB, has displayed efficacy in alleviating liver fibrosis by inhibiting the activation of hepatic stellate cells (HSCs). This study aimed to address the issue of the low productivity in RBP-IIIA-IB. To induce structural changes, the linking sequence, EVDD, between domain IIIA and IB in RBP-IIIA-IB was modified to DGPG, AAAA, and GGPA. Among these, RBP-IIIA-AAAA-IB demonstrated an increase in yield (>4-fold) and a heightened inhibition of HSC activation. Furthermore, we identified amino acid residues that could form disulfide bonds when substituted with cysteine. Through the mutation of N453S-V480S in RBP-IIIA-AAAA-IB, the productivity further increased by over 9-fold, accompanied by an increase in anti-fibrotic activity. Overall, there was a more than 30-fold increase in the fusion protein's yield. These findings demonstrate the effectiveness of modifying linker sequences and introducing extra disulfide bonds to improve both the production yield and biological efficacy of fusion proteins.

13.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557677

RESUMO

Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.


Assuntos
Engenharia de Proteínas , Proteínas , Proteínas/química , Sequência de Aminoácidos , Engenharia de Proteínas/métodos
14.
bioRxiv ; 2024 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-38559226

RESUMO

Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 32,799 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing. Biosurfer's detailed tracking of nucleotide-to-residue relationships helped reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons". Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We found an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyzed long read RNA-seq-predicted proteome of a human cell line and found similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of isoforms predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq datasets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the alternative splicing. Biosurfer is available as a Python package at https://github.com/sheynkman-lab/biosurfer.

15.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600663

RESUMO

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.


Assuntos
Redes Neurais de Computação , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química , Análise de Sequência de Proteína/métodos
16.
Microorganisms ; 12(3)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38543536

RESUMO

Listeria monocytogenes (L. monocytogenes) is a pathogen that is transmitted through contaminated food and causes the illness known as listeriosis. The virulence factor InlA plays a crucial role in the invasion of L. monocytogenes into the human intestinal epithelium. In addition, InlA enhances the pathogenicity of host strains, and different strains of L. monocytogenes contain varying variations of InlA. Our study analyzed a total of 4393 published L. monocytogenes genomes from 511 sequence types (STs) of diverse origins. We identified 300 unique InlA protein sequence types (PSTs) and revealed 45 highly mutated amino acid sites. The leucine-rich repeat (LRR) region was found to be the most conserved among the InlA, while the protein A (PA) region experienced the highest mutation rate. Two new types of mutations were identified in the B-repeat region of InlA. Correspondence analysis (CA) was used to analyze correlations between the lineages or 10 most common sequence types (STs) and amino acid (aa) sites. ST8 was strongly correlated with site 192_F, 454_T. ST7 exhibited a strong correlation with site 51_A, 573_E, 648_S, and 664_A, and it was also associated with ST6 and site 544_N, 671_A, 738_B, 739_B, 740_B, and 774_Y. Additionally, a strong correlation between ST1 and site 142_S, 738_N, ST2 and site 2_K, 142_S, 738_N, as well as ST87 and site2_K, 738_N was demonstrated. Our findings contribute significantly to the understanding of the distribution, composition, and conservation of InlA in L. monocytogenes. These findings also suggest a potential role of InlA in supporting molecular epidemiological tracing efforts.

17.
Front Bioinform ; 4: 1321508, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38343649

RESUMO

The current richness of sequence data needs efficient methodologies to display and analyze the complexity of the information in a compact and readable manner. Traditionally, phylogenetic trees and sequence similarity networks have been used to display and analyze sequences of protein families. These methods aim to shed light on key computational biology problems such as sequence classification and functional inference. Here, we present a new methodology, AlignScape, based on self-organizing maps. AlignScape is applied to three large families of proteins: the kinases and GPCRs from human, and bacterial T6SS proteins. AlignScape provides a map of the similarity landscape and a tree representation of multiple sequence alignments These representations are useful to display, cluster, and classify sequences as well as identify functional trends. The efficient GPU implementation of AlignScape allows the analysis of large MSAs in a few minutes. Furthermore, we show how the AlignScape analysis of proteins belonging to the T6SS complex can be used to predict coevolving partners.

18.
BMC Bioinformatics ; 25(1): 85, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38413857

RESUMO

PURPOSE: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. METHODS: We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. RESULTS: PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. CONCLUSION: Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.


Assuntos
Ácidos Borônicos , Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Algoritmos
19.
Int J Biol Macromol ; 264(Pt 2): 130444, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38417762

RESUMO

Silk, especially spider and insect silk, is a highly versatile biomaterial with potential applications in biomedicine, materials science, and biomimetic engineering. The primary structure of silk proteins is the basis for the mechanical properties of silk fibers. Biotechnologies such as single-molecule sequencing have facilitated an increasing number of reports on new silk genes and assembled silk proteins. Therefore, this review aims to provide a comprehensive overview of the recent advances in representative spider and insect silk proteins, focusing on identification methods, sequence characteristics, and de novo design and assembly. The review discusses three identification methods for silk genes: polymerase chain reaction (PCR)-based sequencing, PCR-free cloning and sequencing, and whole-genome sequencing. Moreover, it reveals the main spider and insect silk proteins and their sequences. Subsequent de novo assembly of artificial silk is covered and future research directions in the field of silk proteins, including new silk genes, customizable artificial silk, and the expansion of silk production and applications are discussed. This review provides a basis for the genetic aspects of silk production and the potential applications of artificial silk in material science and biomedical engineering.


Assuntos
Seda , Aranhas , Animais , Seda/química , Aranhas/química , Biotecnologia , Proteínas de Insetos/genética , Engenharia Biomédica , Proteínas Recombinantes/metabolismo
20.
J Biomol Struct Dyn ; : 1-10, 2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38285683

RESUMO

Computational characterization of multiple Histidine (His) post-translational-modifications (PTM) at enzyme active sites complements tedious experimental characterization in proteins-of-unknown-functions (PUFs) and domain-of-unknown-functions (DUFs). There are only a handful of Histidine-PTM-prediction-tools and those also annotate only a single function. Here, we addressed the problem using artificial neural networks on functional histidine dataset curated from enzyme (protein) sequences available in UniProt database (sample size n = 1584). The convolution-neural-network (CNN) model ('Hist-i-fy') performed the best with 75% overall accuracy/F1-score. A case study was performed on histidine-phosphorylation (n = 34) obtained from mass spectroscopy data. For the first time, we report multiple His-PTM-prediction-tool (https://histify.streamlit.app/& https://github.com/dibyansu24-maker/Histify), with optimal performance. The inputs to the tool are (i) protein sequence containing histidine, and (ii) the histidine residue number. Prediction output is one out of the eight histidine functions-acetylation, ribosylation, glycosylation, hydroxylation, methylation, oxidation, phosphorylation, and protein splicing.Communicated by Ramaswamy H. Sarma.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA