RESUMO
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known. Here, we use Saccharomyces cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, the resulting tyrosine phosphorylation is biologically spurious. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3500 proteins. The number of spurious pY sites generated correlates strongly with decreased growth, and we predict over 1000 pY events to be deleterious. However, we also find that many of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with tyrosine kinases. Our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Assuntos
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crescimento & desenvolvimento , Fosforilação , Proteínas de Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas Tirosina Quinases/metabolismo , Proteínas Tirosina Quinases/genética , Transdução de Sinais , Aptidão Genética , Fosfoproteínas/metabolismo , Fosfoproteínas/genéticaRESUMO
Recent years have seen an explosion of interest in understanding the physicochemical parameters that shape enzyme evolution, as well as substantial advances in computational enzyme design. This review discusses three areas where evolutionary information can be used as part of the design process: (i) using ancestral sequence reconstruction (ASR) to generate new starting points for enzyme design efforts; (ii) learning from how nature uses conformational dynamics in enzyme evolution to mimic this process in silico; and (iii) modular design of enzymes from smaller fragments, again mimicking the process by which nature appears to create new protein folds. Using showcase examples, we highlight the importance of incorporating evolutionary information to continue to push forward the boundaries of enzyme design studies.
Assuntos
Evolução Molecular , Proteínas , Biologia Computacional , Proteínas/genéticaRESUMO
De novo mutations in the synaptic GTPase activating protein (SynGAP) are associated with neurological disorders like intellectual disability, epilepsy, and autism. SynGAP is also implicated in Alzheimer's disease and cancer. Although pathogenic variants are highly penetrant in neurodevelopmental conditions, a substantial number of them are caused by missense mutations that are difficult to diagnose. Hence, in silico mutagenesis was performed for probing the missense effects within the N-terminal region of SynGAP structure. Through extensive molecular dynamics simulations, encompassing three 150-ns replicates for 211 variants, the impact of missense mutations on the protein fold was assessed. The effect of the mutations on the folding stability was also quantitatively assessed using free energy calculations. The mutations were categorized as potentially pathogenic or benign based on their structural impacts. Finally, the study introduces wild-type-SynGAP in complex with RasGTPase at the inner membrane, while considering the potential effects of mutations on these key interactions. This study provides structural perspective to the clinical assessment of SynGAP missense variants and lays the foundation for future structure-based drug discovery.
Assuntos
Simulação de Dinâmica Molecular , Mutação de Sentido Incorreto , Proteínas Ativadoras de ras GTPase , Humanos , Proteínas Ativadoras de ras GTPase/genética , Proteínas Ativadoras de ras GTPase/química , Proteínas Ativadoras de ras GTPase/metabolismo , Dobramento de Proteína , Relação Estrutura-AtividadeRESUMO
In this review article, we explore the transformative impact of deep learning (DL) on structural bioinformatics, emphasizing its pivotal role in a scientific revolution driven by extensive data, accessible toolkits and robust computing resources. As big data continue to advance, DL is poised to become an integral component in healthcare and biology, revolutionizing analytical processes. Our comprehensive review provides detailed insights into DL, featuring specific demonstrations of its notable applications in bioinformatics. We address challenges tailored for DL, spotlight recent successes in structural bioinformatics and present a clear exposition of DL-from basic shallow neural networks to advanced models such as convolution, recurrent, artificial and transformer neural networks. This paper discusses the emerging use of DL for understanding biomolecular structures, anticipating ongoing developments and applications in the realm of structural bioinformatics.
Assuntos
Biologia Computacional , Aprendizado Profundo , Biologia Computacional/métodos , Redes Neurais de Computação , HumanosRESUMO
Understanding the functional impact of genetic mutations on protein structures is essential for advancing cancer research and developing targeted therapies. The main challenge lies in accurately mapping these mutations to protein structures and analysing their effects on protein function. To address this, Mut-Map (https://genemutation.org/) is a comprehensive computational pipeline designed to integrate mutation data from the Catalogue Of Somatic Mutations In Cancer database with protein structural data from the Protein Data Bank and AlphaFold models. The pipeline begins by taking a UniProt ID and proceeds through mapping corresponding Protein Data Bank structures, renumbering residues, and assessing disorder percentages. It then overlays mutation data, categorizes mutations based on structural context, and visualizes them using advanced tools like MolStar. This approach allows for a detailed analysis of how mutations may disrupt protein function by affecting key regions such as DNA interfaces, ligand-binding sites, and dimer interactions. To validate the pipeline, a case study on the TP53 gene, a critical tumour suppressor often mutated in cancers, was conducted. The analysis highlighted the most frequent mutations occurring at the DNA-binding interface, providing insights into their potential role in cancer progression. Mut-Map offers a powerful resource for elucidating the structural implications of cancer-associated mutations, paving the way for more targeted therapeutic strategies and advancing our understanding of protein structure-function relationships.
Assuntos
Biologia Computacional , Mutação , Neoplasias , Neoplasias/genética , Humanos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/química , Software , Conformação ProteicaRESUMO
MOTIVATION: Protein-protein interactions are essential for a variety of biological phenomena including mediating bio-chemical reactions, cell signaling, and the immune response. Proteins seek to form interfaces which reduce overall system energy. Although determination of single polypeptide chain protein structures has been revolutionized by deep learning techniques, complex prediction has still not been perfected. Additionally, experimentally determining structures is incredibly resource and time expensive. An alternative is the technique of computational docking, which takes the solved individual structures of proteins to produce candidate interfaces (decoys). Decoys are then scored using a mathematical function that assess the quality of the system, know as a scoring functions. Beyond docking, scoring functions are a critical component of assessing structures produced by many protein generative models. Scoring models are also used as a final filtering in many generative deep learning models including those that generate antibody binders, and those which perform docking. RESULTS: In this work we present improved scoring functions for protein-protein interactions which utilizes cutting-edge euclidean graph neural network architectures, to assess protein-protein interfaces. These euclidean docking score models are known as EuDockScore, and EuDockScore-Ab with the latter being antibody-antigen dock specific. Finally, we provided EuDockScore-AFM a model trained on antibody-antigen outputs from AlphaFold-Multimer which proves useful in re-ranking large numbers of AlphaFold-Multimer outputs. AVAILABILITY: The code for these models is available at https://gitlab.com/mcfeemat/eudockscore.
RESUMO
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Assuntos
Aprendizado Profundo , Proteínas/metabolismo , Conformação ProteicaRESUMO
The type VI secretion system (T6SS) is an important mediator of microbe-microbe and microbe-host interactions. Gram-negative bacteria use the T6SS to inject T6SS effectors (T6Es), which are usually proteins with toxic activity, into neighboring cells. Antibacterial effectors have cognate immunity proteins that neutralize self-intoxication. Here, we applied novel structural bioinformatic tools to perform systematic discovery and functional annotation of T6Es and their cognate immunity proteins from a dataset of 17,920 T6SS-encoding bacterial genomes. Using structural clustering, we identified 517 putative T6E families, outperforming sequence-based clustering. We developed a logistic regression model to reliably quantify protein-protein interaction of new T6E-immunity pairs, yielding candidate immunity proteins for 231 out of the 517 T6E families. We used sensitive structure-based annotation which yielded functional annotations for 51% of the T6E families, again outperforming sequence-based annotation. Next, we validated four novel T6E-immunity pairs using basic experiments in E. coli. In particular, we showed that the Pfam domain DUF3289 is a homolog of Colicin M and that DUF943 acts as its cognate immunity protein. Furthermore, we discovered a novel T6E that is a structural homolog of SleB, a lytic transglycosylase, and identified a specific glutamate that acts as its putative catalytic residue. Overall, this study applies novel structural bioinformatic tools to T6E-immunity pair discovery, and provides an extensive database of annotated T6E-immunity pairs.
Assuntos
Proteínas de Bactérias , Biologia Computacional , Sistemas de Secreção Tipo VI , Biologia Computacional/métodos , Sistemas de Secreção Tipo VI/genética , Sistemas de Secreção Tipo VI/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/química , Escherichia coli/genética , Escherichia coli/metabolismo , Escherichia coli/imunologia , Bactérias Gram-Negativas/imunologia , Bactérias Gram-Negativas/genética , Genoma Bacteriano , Anotação de Sequência MolecularRESUMO
BACKGROUND: Machine learning (ML) has a rich history in structural bioinformatics, and modern approaches, such as deep learning, are revolutionizing our knowledge of the subtle relationships between biomolecular sequence, structure, function, dynamics and evolution. As with any advance that rests upon statistical learning approaches, the recent progress in biomolecular sciences is enabled by the availability of vast volumes of sufficiently-variable data. To be useful, such data must be well-structured, machine-readable, intelligible and manipulable. These and related requirements pose challenges that become especially acute at the computational scales typical in ML. Furthermore, in structural bioinformatics such data generally relate to protein three-dimensional (3D) structures, which are inherently more complex than sequence-based data. A significant and recurring challenge concerns the creation of large, high-quality, openly-accessible datasets that can be used for specific training and benchmarking tasks in ML pipelines for predictive modeling projects, along with reproducible splits for training and testing. RESULTS: Here, we report 'Prop3D', a platform that allows for the creation, sharing and extensible reuse of libraries of protein domains, featurized with biophysical and evolutionary properties that can range from detailed, atomically-resolved physicochemical quantities (e.g., electrostatics) to coarser, residue-level features (e.g., phylogenetic conservation). As a community resource, we also supply a 'Prop3D-20sf' protein dataset, obtained by applying our approach to CATH . We have developed and deployed the Prop3D framework, both in the cloud and on local HPC resources, to systematically and reproducibly create comprehensive datasets via the Highly Scalable Data Service ( HSDS ). Our datasets are freely accessible via a public HSDS instance, or they can be used with accompanying Python wrappers for popular ML frameworks. CONCLUSION: Prop3D and its associated Prop3D-20sf dataset can be of broad utility in at least three ways. Firstly, the Prop3D workflow code can be customized and deployed on various cloud-based compute platforms, with scalability achieved largely by saving the results to distributed HDF5 files via HSDS . Secondly, the linked Prop3D-20sf dataset provides a hand-crafted, already-featurized dataset of protein domains for 20 highly-populated CATH families; importantly, provision of this pre-computed resource can aid the more efficient development (and reproducible deployment) of ML pipelines. Thirdly, Prop3D-20sf's construction explicitly takes into account (in creating datasets and data-splits) the enigma of 'data leakage', stemming from the evolutionary relationships between proteins.
Assuntos
Biologia Computacional , Proteínas , Humanos , Filogenia , Biologia Computacional/métodos , Fluxo de Trabalho , Aprendizado de MáquinaRESUMO
PPIs, or protein-protein interactions, are essential for many biological processes. According to the findings, abnormal PPIs have been linked to several diseases, such as cancer and infectious and neurological disorders. Consequently, focusing on PPIs is a path toward disease treatment and a crucial tool for producing novel medications. Many methods exist to investigate PPIs, including low- and high-throughput studies. Since many PPIs have been discovered using in vitro and in vivo experimental approaches, the use of computational methods to predict PPIs has grown due to the expanding scale of PPI data and the intrinsic complexity of interacting mechanisms. Recognizing PPI networks offers a systematic means of predicting protein functions, and pathways that are included. These investigations can help uncover the underlying molecular mechanisms of complex phenotypes and clarify the biological processes related to health and diseases. Therefore, our goal in this study is to provide an overview of the latest and most popular approaches for investigating PPIs. We also overview some important clinical approaches based on the PPIs and how these interactions can be targeted.
RESUMO
Respiratory tract infections (RTIs) have a significant impact on global health, especially among children and the elderly. The key bacterial pathogens Streptococcus pneumoniae, Haemophilus influenzae, Klebsiella pneumoniae, Staphylococcus aureus and non-fermenting Gram Negative bacteria such as Acinetobacter baumannii and Pseudomonas aeruginosa are most commonly associated with RTIs. These bacterial pathogens have evolved a diverse array of resistance mechanisms through horizontal gene transfer, often mediated by mobile genetic elements and environmental acquisition. Treatment failures are primarily due to antimicrobial resistance and inadequate bacterial engagement, which necessitates the development of alternative treatment strategies. To overcome this, our review mainly focuses on different virulence mechanisms and their resulting pathogenicity, highlighting different therapeutic interventions to combat resistance. To prevent the antimicrobial resistance crisis, we also focused on leveraging the application of artificial intelligence and machine learning to manage RTIs. Integrative approaches combining mechanistic insights are crucial for addressing the global challenge of antimicrobial resistance in respiratory infections.
Assuntos
Antibacterianos , Infecções Respiratórias , Infecções Respiratórias/microbiologia , Infecções Respiratórias/tratamento farmacológico , Humanos , Antibacterianos/farmacologia , Antibacterianos/uso terapêutico , Bactérias/genética , Bactérias/efeitos dos fármacos , Bactérias/classificação , Farmacorresistência Bacteriana , Infecções Bacterianas/microbiologia , Infecções Bacterianas/tratamento farmacológico , VirulênciaRESUMO
Gene expression signatures (GES) connect phenotypes to differential messenger RNA (mRNA) expression of genes, providing a powerful approach to define cellular identity, function, and the effects of perturbations. The use of GES has suffered from vague assessment criteria and limited reproducibility. Because the structure of proteins defines the functional capability of genes, we hypothesized that enrichment of structural features could be a generalizable representation of gene sets. We derive structural gene expression signatures (sGES) using features from multiple levels of protein structure (e.g., domain and fold) encoded by the mRNAs in GES. Comprehensive analyses of data from the Genotype-Tissue Expression Project (GTEx), the all RNA-seq and ChIP-seq sample and signature search (ARCHS4) database, and mRNA expression of drug effects on cardiomyocytes show that sGES are useful for characterizing biological phenomena. sGES enable phenotypic characterization across experimental platforms, facilitates interoperability of expression datasets, and describe drug action on cells.
Assuntos
Conformação Proteica , Proteínas/química , Proteínas/genética , Transcriptoma , Linhagem Celular , Sequenciamento de Cromatina por Imunoprecipitação , Biologia Computacional , Expressão Gênica , Perfilação da Expressão Gênica , Humanos , Miócitos Cardíacos , RNA Mensageiro , RNA-Seq , Reprodutibilidade dos TestesRESUMO
Reliable and accurate methods of estimating the accuracy of predicted protein models are vital to understanding their respective utility. Discerning how the quaternary structure conforms can significantly improve our collective understanding of cell biology, systems biology, disease formation, and disease treatment. Accurately determining the quality of multimeric protein models is still computationally challenging, as the space of possible conformations is significantly larger when proteins form in complex with one another. Here, we present EGG (energy and graph-based architectures) to assess the accuracy of predicted multimeric protein models. We implemented message-passing and transformer layers to infer the overall fold and interface accuracy scores of predicted multimeric protein models. When evaluated with CASP15 targets, our methods achieved promising results against single model predictors: fourth and third place for determining the highest-quality model when estimating overall fold accuracy and overall interface accuracy, respectively, and first place for determining the top three highest quality models when estimating both overall fold accuracy and overall interface accuracy.
Assuntos
Modelos Moleculares , Redes Neurais de Computação , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Multimerização Proteica , Conformação ProteicaRESUMO
Alzheimer's disease is the most common form of dementia, characterized by the pathological accumulation of amyloid-beta (Aß) plaques and tau neurofibrillary tangles. Triggering receptor expressed on myeloid cells 2 (TREM2) is increasingly recognized as playing a central role in Aß clearance and microglia activation in AD. The TREM2 gene transcriptional product is alternatively spliced to produce three different protein isoforms. The canonical TREM2 isoform binds to DAP12 to activate downstream pathways. However, little is known about the function or interaction partners of the alternative TREM2 isoforms. The present study utilized a computational approach in a systematic search for new interaction partners of the TREM2 isoforms by integrating several state-of-the-art structural bioinformatics tools from initial large-scale screening to one-on-one corroborative modeling and eventual all-atom visualization. CD9, a cell surface glycoprotein involved in cell-cell adhesion and migration, was identified as a new interaction partner for two TREM2 isoforms, and CALM, a calcium-binding protein involved in calcium signaling, was identified as an interaction partner for a third TREM2 isoform, highlighting the potential role of cell adhesion and calcium regulation in AD.
Assuntos
Processamento Alternativo , Doença de Alzheimer , Glicoproteínas de Membrana , Ligação Proteica , Isoformas de Proteínas , Receptores Imunológicos , Glicoproteínas de Membrana/metabolismo , Glicoproteínas de Membrana/genética , Humanos , Receptores Imunológicos/metabolismo , Receptores Imunológicos/genética , Isoformas de Proteínas/metabolismo , Isoformas de Proteínas/genética , Doença de Alzheimer/metabolismo , Doença de Alzheimer/genética , Biologia Computacional/métodosRESUMO
The study of rare diseases is important not only for the individuals affected but also for the advancement of medical knowledge and a deeper understanding of human biology and genetics. The wide repertoire of structural information now available from reliable and accurate prediction methods provides the opportunity to investigate the molecular origins of most of the rare diseases reviewed in the Orpha.net database. Thus, it has been possible to analyze the topology of the pathogenic missense variants found in the 2515 proteins involved in Mendelian rare diseases (MRDs), which form the database for our structural bioinformatics study. The amino acid substitutions responsible for MRDs showed different mutation site distributions at different three-dimensional protein depths. We then highlighted the depth-dependent effects of pathogenic variants for the 20,061 pathogenic variants that are present in our database. The results of this structural bioinformatics investigation are relevant, as they provide additional clues to mitigate the damage caused by MRD.
Assuntos
Biologia Computacional , Doenças Raras , Humanos , Biologia Computacional/métodos , Doenças Raras/genética , Mutação de Sentido Incorreto , Bases de Dados Genéticas , Proteínas/química , Proteínas/genética , Modelos Moleculares , Substituição de Aminoácidos , Conformação ProteicaRESUMO
BACKGROUND: Biotite is a program library for sequence and structural bioinformatics written for the Python programming language. It implements widely used computational methods into a consistent and accessible package. This allows for easy combination of various data analysis, modeling and simulation methods. RESULTS: This article presents major functionalities introduced into Biotite since its original publication. The fields of application are shown using concrete examples. We show that the computational performance of Biotite for bioinformatics tasks is comparable to individual, special purpose software systems specifically developed for the respective single task. CONCLUSIONS: The results show that Biotite can be used as program library to either answer specific bioinformatics questions and simultaneously allow the user to write entire, self-contained software applications with sufficient performance for general application.
Assuntos
Simulação por Computador , Modelos Moleculares , Proteínas , Software , Linguagens de Programação , Alinhamento de Sequência , Sequência de Bases , Proteínas/química , alfa-Globinas/química , HumanosRESUMO
BACKGROUND: High throughput experiments in cancer and other areas of genomic research identify large numbers of sequence variants that need to be evaluated for phenotypic impact. While many tools exist to score the likely impact of single nucleotide polymorphisms (SNPs) based on sequence alone, the three-dimensional structural environment is essential for understanding the biological impact of a nonsynonymous mutation. RESULTS: We present a program, 3DVizSNP, that enables the rapid visualization of nonsynonymous missense mutations extracted from a variant caller format file using the web-based iCn3D visualization platform. The program, written in Python, leverages REST APIs and can be run locally without installing any other software or databases, or from a webserver hosted by the National Cancer Institute. It automatically selects the appropriate experimental structure from the Protein Data Bank, if available, or the predicted structure from the AlphaFold database, enabling users to rapidly screen SNPs based on their local structural environment. 3DVizSNP leverages iCn3D annotations and its structural analysis functions to assess changes in structural contacts associated with mutations. CONCLUSIONS: This tool enables researchers to efficiently make use of 3D structural information to prioritize mutations for further computational and experimental impact assessment. The program is available as a webserver at https://analysistools.cancer.gov/3dvizsnp or as a standalone python program at https://github.com/CBIIT-CGBB/3DVizSNP .
Assuntos
Biologia Computacional , Mutação de Sentido Incorreto , Biologia Computacional/métodos , Genômica/métodos , Software , MutaçãoRESUMO
Fungi, though mesophilic, include thermophilic and thermostable species, as well. The thermostability of proteins observed in these fungi is most likely to be attributed to several molecular factors, such as the presence of salt bridges and hydrogen bond interactions between side chains. These factors cannot be generalized for all fungi. Factors impacting thermostability can guide how fungal thermophilic proteins gain thermostability. We curated a dataset of proteins for 14 thermophilic fungi and their evolutionarily closer mesophiles. Additionally, the proteome of Chaetomium thermophilum and its evolutionarily related mesophile Chaetomium globosum was analyzed. Using eggNOG, we categorized the proteomes into clusters of orthologous groups (COGs). While the individual count of proteins is over-represented in mesophiles (for COGs S, G, L, and Q), there are certain features that are significantly enriched in thermophiles (such as charged residues, exposed residues, polar residues, etc.). Since fungi are known to be cellulolytic and chitinolytic by nature, we selected 37 existing carbohydrate-active enzymes (CAZyme) families in Eurotiales, Mucorales, and Sordariales. We looked at closely similar sequences and their modeled structures for further comparison. Comparing solvent accessibilities of thermophilic and mesophilic proteins, exposed and intermediate residues are observed higher in thermophiles whereas buried residues are observed higher in mesophiles. For specific five CAZYme families (GH7, GH11, GH18, GH45, and CBM1) we looked at position-specific substitutions between thermophiles and mesophiles. We also found that there are relatively more intramolecular interactions in thermophiles compared to mesophiles. Thus, we found factors such as surface exposed residues and charged residues that are highly likely to impart thermostability in fungi, and this study sets the stage for further studies in the area of fungal thermostability.
RESUMO
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Assuntos
Biologia Computacional , Furilfuramida , Biologia Computacional/métodos , Modelos Moleculares , Proteínas/química , Alinhamento de SequênciaRESUMO
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.