Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 112
Filtrar
1.
Nucleic Acids Res ; 52(W1): W318-W323, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38634802

RESUMO

The 'structure assessment' web server is a one-stop shop for interactive evaluation and benchmarking of structural models of macromolecular complexes including proteins and nucleic acids. A user-friendly web dashboard links sequence with structure information and results from a variety of state-of-the-art tools, which facilitates the visual exploration and evaluation of structure models. The dashboard integrates stereochemistry information, secondary structure information, global and local model quality assessment of the tertiary structure of comparative protein models, as well as prediction of membrane location. In addition, a benchmarking mode is available where a model can be compared to a reference structure, providing easy access to scores that have been used in recent CASP experiments and CAMEO. The structure assessment web server is available at https://swissmodel.expasy.org/assess.


Assuntos
Internet , Modelos Moleculares , Software , Proteínas/química , Benchmarking , Conformação Proteica
2.
mSphere ; 9(4): e0079923, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38501831

RESUMO

BK polyomavirus (BKPyV) is a double-stranded DNA virus causing nephropathy, hemorrhagic cystitis, and urothelial cancer in transplant patients. The BKPyV-encoded capsid protein Vp1 and large T-antigen (LTag) are key targets of neutralizing antibodies and cytotoxic T-cells, respectively. Our single-center data suggested that variability in Vp1 and LTag may contribute to failing BKPyV-specific immune control and impact vaccine design. We, therefore, analyzed all available entries in GenBank (1516 VP1; 742 LTAG) and explored potential structural effects using computational approaches. BKPyV-genotype (gt)1 was found in 71.18% of entries, followed by BKPyV-gt4 (19.26%), BKPyV-gt2 (8.11%), and BKPyV-gt3 (1.45%), but rates differed according to country and specimen type. Vp1-mutations matched a serotype different than the assigned one or were serotype-independent in 43%, 18% affected more than one amino acid. Notable Vp1-mutations altered antibody-binding domains, interactions with sialic acid receptors, or were predicted to change conformation. LTag-sequences were more conserved, with only 16 mutations detectable in more than one entry and without significant effects on LTag-structure or interaction domains. However, LTag changes were predicted to affect HLA-class I presentation of immunodominant 9mers to cytotoxic T-cells. These global data strengthen single center observations and specifically our earlier findings revealing mutant 9mer epitopes conferring immune escape from HLA-I cytotoxic T cells. We conclude that variability of BKPyV-Vp1 and LTag may have important implications for diagnostic assays assessing BKPyV-specific immune control and for vaccine design. IMPORTANCE: Type and rate of amino acid variations in BKPyV may provide important insights into BKPyV diversity in human populations and an important step toward defining determinants of BKPyV-specific immunity needed to protect vulnerable patients from BKPyV diseases. Our analysis of BKPyV sequences obtained from human specimens reveals an unexpectedly high genetic variability for this double-stranded DNA virus that strongly relies on host cell DNA replication machinery with its proof reading and error correction mechanisms. BKPyV variability and immune escape should be taken into account when designing further approaches to antivirals, monoclonal antibodies, and vaccines for patients at risk of BKPyV diseases.

3.
J Mol Biol ; : 168546, 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38508301

RESUMO

IHMCIF (github.com/ihmwg/IHMCIF) is a data information framework that supports archiving and disseminating macromolecular structures determined by integrative or hybrid modeling (IHM), and making them Findable, Accessible, Interoperable, and Reusable (FAIR). IHMCIF is an extension of the Protein Data Bank Exchange/macromolecular Crystallographic Information Framework (PDBx/mmCIF) that serves as the framework for the Protein Data Bank (PDB) to archive experimentally determined atomic structures of biological macromolecules and their complexes with one another and small molecule ligands (e.g., enzyme cofactors and drugs). IHMCIF serves as the foundational data standard for the PDB-Dev prototype system, developed for archiving and disseminating integrative structures. It utilizes a flexible data representation to describe integrative structures that span multiple spatiotemporal scales and structural states with definitions for restraints from a variety of experimental methods contributing to integrative structural biology. The IHMCIF extension was created with the benefit of considerable community input and recommendations gathered by the Worldwide Protein Data Bank (wwPDB) Task Force for Integrative or Hybrid Methods (wwpdb.org/task/hybrid). Herein, we describe the development of IHMCIF to support evolving methodologies and ongoing advancements in integrative structural biology. Ultimately, IHMCIF will facilitate the unification of PDB-Dev data and tools with the PDB archive so that integrative structures can be archived and disseminated through PDB.

4.
Bioinformatics ; 40(1)2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38175775

RESUMO

MOTIVATION: Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS: In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION: The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.


Assuntos
Aminoácidos , Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Idioma
5.
Proteins ; 92(1): 3-14, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37465978

RESUMO

Most proteins found in the outer membrane of gram-negative bacteria share a common domain: the transmembrane ß-barrel. These outer membrane ß-barrels (OMBBs) occur in multiple sizes and different families with a wide range of functions evolved independently by amplification from a pool of homologous ancestral ßß-hairpins. This is part of the reason why predicting their three-dimensional (3D) structure, especially by homology modeling, is a major challenge. Recently, DeepMind's AlphaFold v2 (AF2) became the first structure prediction method to reach close-to-experimental atomic accuracy in CASP even for difficult targets. However, membrane proteins, especially OMBBs, were not abundant during their training, raising the question of how accurate the predictions are for these families. In this study, we assessed the performance of AF2 in the prediction of OMBBs and OMBB-like folds of various topologies using an in-house-developed tool for the analysis of OMBB 3D structures, and barrOs. In agreement with previous studies on other membrane protein classes, our results indicate that AF2 predicts transmembrane ß-barrel structures at high accuracy independently of the use of templates, even for novel topologies absent from the training set. These results provide confidence on the models generated by AF2 and open the door to the structural elucidation of novel transmembrane ß-barrel topologies identified in high-throughput OMBB annotation studies or designed de novo.


Assuntos
Furilfuramida , Proteínas de Membrana , Humanos , Proteínas de Membrana/química , Proteínas da Membrana Bacteriana Externa/química
6.
Proteins ; 91(12): 1539-1549, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37920879

RESUMO

Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional/métodos
7.
Proteins ; 91(12): 1811-1821, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37795762

RESUMO

CASP15 introduced a new category, ligand prediction, where participants were provided with a protein or nucleic acid sequence, SMILES line notation, and stoichiometry for ligands and tasked with generating computational models for the three-dimensional structure of the corresponding protein-ligand complex. These models were subsequently compared with experimental structures determined by x-ray crystallography or cryoEM. To assess these predictions, two novel scores were developed. The Binding-Site Superposed, Symmetry-Corrected Pose Root Mean Square Deviation (BiSyRMSD) evaluated the absolute deviations of the models from the experimental structures. At the same time, the Local Distance Difference Test for Protein-Ligand Interactions (lDDT-PLI) assessed the ability of models to reproduce the protein-ligand interactions in the experimental structures. The ligands evaluated in this challenge range from single-atom ions to large flexible organic molecules. More than 1800 submissions were evaluated for their ability to predict 23 different protein-ligand complexes. Overall, the best models could faithfully reproduce the geometries of more than half of the prediction targets. The ligands' size and flexibility were the primary factors influencing the predictions' quality. Small ions and organic molecules with limited flexibility were predicted with high fidelity, while reproducing the binding poses of larger, flexible ligands proved more challenging.


Assuntos
Modelos Moleculares , Humanos , Ligantes , Sítios de Ligação , Íons , Ligação Proteica , Cristalografia por Raios X
8.
Proteins ; 91(12): 1850-1860, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37858934

RESUMO

Predicting model quality is a fundamental component of any modeling procedure, and blind assessment of these methods constitutes a crucial aspect of the Critical Assessment of Protein Structure Prediction (CASP) experiment. Historically, the main focus was on assessing methods that predict global and per-residue accuracies in tertiary structure models. This focus shifted with the community's increased efforts in modeling complexes and assemblies. We asked the community to process the models from the CASP15 assembly category and provide estimates of the accuracy of the predicted quaternary structure, both globally and at the local interface level. Besides identifying remarkable accuracy of modeling groups in assessing their own predictions, we set up a benchmarking pipeline to highlight different aspects of quaternary structure models and introduced a simple consensus EMA method as baseline. While participating methods showed commendable performance, the baseline was difficult to surpass. It is important to point out that prediction performance varies for the individual CASP targets, highlighting potential areas of improvement and challenges ahead.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Biologia Computacional/métodos , Proteínas/química , Benchmarking
9.
Proteins ; 91(12): 1912-1924, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37885318

RESUMO

The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions.


Assuntos
Benchmarking , Proteínas , Sítios de Ligação , Ligação Proteica , Ligantes , Reprodutibilidade dos Testes , Simulação de Acoplamento Molecular , Proteínas/química , Conformação Proteica
10.
J Exp Med ; 220(12)2023 12 04.
Artigo em Inglês | MEDLINE | ID: mdl-37773046

RESUMO

Targeted eradication of transformed or otherwise dysregulated cells using monoclonal antibodies (mAb), antibody-drug conjugates (ADC), T cell engagers (TCE), or chimeric antigen receptor (CAR) cells is very effective for hematologic diseases. Unlike the breakthrough progress achieved for B cell malignancies, there is a pressing need to find suitable antigens for myeloid malignancies. CD123, the interleukin-3 (IL-3) receptor alpha-chain, is highly expressed in various hematological malignancies, including acute myeloid leukemia (AML). However, shared CD123 expression on healthy hematopoietic stem and progenitor cells (HSPCs) bears the risk for myelotoxicity. We demonstrate that epitope-engineered HSPCs were shielded from CD123-targeted immunotherapy but remained functional, while CD123-deficient HSPCs displayed a competitive disadvantage. Transplantation of genome-edited HSPCs could enable tumor-selective targeted immunotherapy while rebuilding a fully functional hematopoietic system. We envision that this approach is broadly applicable to other targets and cells, could render hitherto undruggable targets accessible to immunotherapy, and will allow continued posttransplant therapy, for instance, to treat minimal residual disease (MRD).


Assuntos
Subunidade alfa de Receptor de Interleucina-3 , Leucemia Mieloide Aguda , Humanos , Subunidade alfa de Receptor de Interleucina-3/metabolismo , Epitopos , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/terapia , Imunoterapia , Células-Tronco Hematopoéticas/metabolismo , Imunoterapia Adotiva
11.
Nature ; 622(7983): 646-653, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37704037

RESUMO

We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the ß-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.


Assuntos
Bases de Dados de Proteínas , Aprendizado Profundo , Anotação de Sequência Molecular , Dobramento de Proteína , Proteínas , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Internet , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo
12.
Proteins ; 91(12): 1550-1557, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37306011

RESUMO

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Ligantes
13.
Proteomics ; 23(17): e2200323, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37365936

RESUMO

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.


Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação Proteica
14.
J Mol Biol ; 435(14): 168021, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-36828268

RESUMO

ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.


Assuntos
Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Conformação Proteica , Software
15.
Gigascience ; 112022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36448847

RESUMO

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.


Assuntos
Metadados , Registros , Sequência de Aminoácidos , Bases de Dados de Proteínas , Simulação por Computador
17.
Science ; 374(6573): 1319-1320, 2021 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-34882469

RESUMO

Deep learning provides an atomic snapshot of the yeast protein interactome.


Assuntos
Aprendizado Profundo
18.
Proteins ; 89(12): 1607-1617, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34533838

RESUMO

Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.


Assuntos
Conformação Proteica , Dobramento de Proteína , Proteínas , Software , Sequência de Aminoácidos , Biologia Computacional , Modelos Estatísticos , Simulação de Dinâmica Molecular , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
19.
Proteins ; 89(12): 1647-1672, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34561912

RESUMO

The biological and functional significance of selected Critical Assessment of Techniques for Protein Structure Prediction 14 (CASP14) targets are described by the authors of the structures. The authors highlight the most relevant features of the target proteins and discuss how well these features were reproduced in the respective submitted predictions. The overall ability to predict three-dimensional structures of proteins has improved remarkably in CASP14, and many difficult targets were modeled with impressive accuracy. For the first time in the history of CASP, the experimentalists not only highlighted that computational models can accurately reproduce the most critical structural features observed in their targets, but also envisaged that models could serve as a guidance for further studies of biologically-relevant properties of proteins.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Biologia Computacional , Microscopia Crioeletrônica , Cristalografia por Raios X , Análise de Sequência de Proteína
20.
Proteins ; 89(12): 1977-1986, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34387007

RESUMO

The Continuous Automated Model EvaluatiOn (CAMEO) platform complements the biennial CASP experiment by conducting fully automated blind evaluations of three-dimensional protein prediction servers based on the weekly prerelease of sequences of those structures, which are going to be published in the upcoming release of the Protein Data Bank. While in CASP14, significant success was observed in predicting the structures of individual protein chains with high accuracy, significant challenges remain in correctly predicting the structures of complexes. By implementing fully automated evaluation of predictions for protein-protein complexes, as well as for proteins in complex with ligands, peptides, nucleic acids, or proteins containing noncanonical amino acid residues, CAMEO will assist new developments in those challenging areas of active research.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Conformação Proteica , Análise de Sequência de Proteína , Software , Benchmarking , Análise por Conglomerados , Modelos Moleculares , Proteínas/química , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA