Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 108
Filtrar
1.
Cancer Res Commun ; 4(1): 213-225, 2024 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-38282550

RESUMO

POLE driver mutations in the exonuclease domain (ExoD driver) are prevalent in several cancers, including colorectal cancer and endometrial cancer, leading to dramatically ultra-high tumor mutation burden (TMB). To understand whether POLE mutations that are not classified as drivers (POLE Variant) contribute to mutagenesis, we assessed TMB in 447 POLE-mutated colorectal cancers, endometrial cancers, and ovarian cancers classified as TMB-high ≥10 mutations/Mb (mut/Mb) or TMB-low <10 mut/Mb. TMB was significantly highest in tumors with "POLE ExoD driver plus POLE Variant" (colorectal cancer and endometrial cancer, P < 0.001; ovarian cancer, P < 0.05). TMB increased with additional POLE variants (P < 0.001), but plateaued at 2, suggesting an association between the presence of these variants and TMB. Integrated analysis of AlphaFold2 POLE models and quantitative stability estimates predicted the impact of multiple POLE variants on POLE functionality. The prevalence of immunogenic neoepitopes was notably higher in the "POLE ExoD driver plus POLE Variant" tumors. Overall, this study reveals a novel correlation between POLE variants in POLE ExoD-driven tumors, and ultra-high TMB. Currently, only select pathogenic ExoD mutations with a reliable association with ultra-high TMB inform clinical practice. Thus, these findings are hypothesis-generating, require functional validation, and could potentially inform tumor classification, treatment responses, and clinical outcomes. SIGNIFICANCE: Somatic POLE ExoD driver mutations cause proofreading deficiency that induces high TMB. This study suggests a novel modifier role for POLE variants in POLE ExoD-driven tumors, associated with ultra-high TMB. These data, in addition to future functional studies, may inform tumor classification, therapeutic response, and patient outcomes.


Assuntos
Neoplasias Colorretais , Neoplasias do Endométrio , Neoplasias Ovarianas , Feminino , Humanos , Mutagênicos , Exonucleases/genética , Proteínas de Ligação a Poli-ADP-Ribose/genética , DNA Polimerase II/genética , Mutação/genética , Neoplasias do Endométrio/genética , Mutagênese , Neoplasias Ovarianas/epidemiologia , Neoplasias Colorretais/genética
3.
bioRxiv ; 2023 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-37547017

RESUMO

Humans have 437 catalytically competent protein kinase domains with the typical kinase fold, similar to the structure of Protein Kinase A (PKA). Only 155 of these kinases are in the Protein Data Bank in their active form. The active form of a kinase must satisfy requirements for binding ATP, magnesium, and substrate. From structural bioinformatics analysis of 40 unique substrate-bound kinases, we derived several criteria for the active form of protein kinases. We include requirements on the DFG motif of the activation loop but also on the positions of the N-terminal and C-terminal segments of the activation loop that must be placed appropriately to bind substrate. Because the active form of catalytic kinases is needed for understanding substrate specificity and the effects of mutations on catalytic activity in cancer and other diseases, we used AlphaFold2 to produce models of all 437 human protein kinases in the active form. This was accomplished with templates in the active form from the PDB and shallow multiple sequence alignments of orthologs and close homologs of the query protein. We selected models for each kinase based on the pLDDT scores of the activation loop residues, demonstrating that the highest scoring models have the lowest or close to the lowest RMSD to 22 non-redundant substrate-bound structures in the PDB. A larger benchmark of all 130 active kinase structures with complete activation loops in the PDB shows that 80% of the highest-scoring AlphaFold2 models have RMSD < 1.0 Å and 90% have RMSD < 2.0 Å over the activation loop backbone atoms. Models for all 437 catalytic kinases are available at http://dunbrack.fccc.edu/kincore/activemodels. We believe they may be useful for interpreting mutations leading to constitutive catalytic activity in cancer as well as for templates for modeling substrate and inhibitor binding for molecules which bind to the active state.

5.
Proteomics ; 23(17): e2200323, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37365936

RESUMO

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.


Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação Proteica
6.
J Biol Chem ; 299(8): 104965, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37356718

RESUMO

Janus Kinase-1 (JAK1) plays key roles during neurodevelopment and following neuronal injury, while activatory JAK1 mutations are linked to leukemia. In mice, Jak1 genetic deletion results in perinatal lethality, suggesting non-redundant roles and/or regulation of JAK1 for which other JAKs cannot compensate. Proteomic studies reveal that JAK1 is more likely palmitoylated compared to other JAKs, implicating palmitoylation as a possible JAK1-specific regulatory mechanism. However, the importance of palmitoylation for JAK1 signaling has not been addressed. Here, we report that JAK1 is palmitoylated in transfected HEK293T cells and endogenously in cultured Dorsal Root Ganglion (DRG) neurons. We further use comprehensive screening in transfected non-neuronal cells and shRNA-mediated knockdown in DRG neurons to identify the related enzymes ZDHHC3 and ZDHHC7 as dominant protein acyltransferases (PATs) for JAK1. Surprisingly, we found palmitoylation minimally affects JAK1 localization in neurons, but is critical for JAK1's kinase activity in cells and even in vitro. We propose this requirement is likely because palmitoylation facilitates transphosphorylation of key sites in JAK1's activation loop, a possibility consistent with structural models of JAK1. Importantly, we demonstrate a leukemia-associated JAK1 mutation overrides the palmitoylation-dependence of JAK1 activity, potentially explaining why this mutation is oncogenic. Finally, we show that JAK1 palmitoylation is important for neuropoietic cytokine-dependent signaling and neuronal survival and that combined Zdhhc3/7 loss phenocopies loss of palmitoyl-JAK1. These findings provide new insights into the control of JAK signaling in both physiological and pathological contexts.


Assuntos
Citocinas , Lipoilação , Neurônios , Transdução de Sinais , Animais , Feminino , Humanos , Camundongos , Gravidez , Citocinas/metabolismo , Gânglios Espinais/metabolismo , Células HEK293 , Janus Quinase 1/genética , Janus Quinase 1/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Proteômica , Sobrevivência Celular
7.
Heliyon ; 9(4): e15032, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37035348

RESUMO

The human infectious disease COVID-19 caused by the SARS-CoV-2 virus has become a major threat to global public health. Developing a vaccine is the preferred prophylactic response to epidemics and pandemics. However, for individuals who have contracted the disease, the rapid design of antibodies that can target the SARS-CoV-2 virus fulfils a critical need. Further, discovering antibodies that bind multiple variants of SARS-CoV-2 can aid in the development of rapid antigen tests (RATs) which are critical for the identification and isolation of individuals currently carrying COVID-19. Here we provide a proof-of-concept study for the computational design of high-affinity antibodies that bind to multiple variants of the SARS-CoV-2 spike protein using RosettaAntibodyDesign (RAbD). Well characterized antibodies that bind with high affinity to the SARS-CoV-1 (but not SARS-CoV-2) spike protein were used as templates and re-designed to bind the SARS-CoV-2 spike protein with high affinity, resulting in a specificity switch. A panel of designed antibodies were experimentally validated. One design bound to a broad range of variants of concern including the Omicron, Delta, Wuhan, and South African spike protein variants.

8.
BMC Genomics ; 24(1): 212, 2023 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-37095444

RESUMO

BACKGROUND: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. METHODS: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. RESULTS: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. CONCLUSIONS: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Humanos , Predisposição Genética para Doença , Replicação do DNA , Mutação em Linhagem Germinativa , Células Germinativas
9.
bioRxiv ; 2023 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-36945596

RESUMO

The Ser/Thr protein phosphatase 2A (PP2A) is a highly conserved collection of heterotrimeric holoenzymes responsible for the dephosphorylation of many regulated phosphoproteins. Substrate recognition and the integration of regulatory cues are mediated by B regulatory subunits that are complexed to the catalytic subunit (C) by a scaffold protein (A). PP2A/B55 substrate recruitment was thought to be mediated by charge-charge interactions between the surface of B55α and its substrates. Challenging this view, we recently discovered a conserved SLiM [ RK ]- V -x-x-[ VI ]- R in a range of proteins, including substrates such as the retinoblastoma-related protein p107 and TAU (Fowle et al. eLife 2021;10:e63181). Here we report the identification of this SLiM in FAM122A, an inhibitor of B55α/PP2A. This conserved SLiM is necessary for FAM122A binding to B55α in vitro and in cells. Computational structure prediction with AlphaFold2 predicts an interaction consistent with the mutational and biochemical data and supports a mechanism whereby FAM122A uses the 'SLiM' in the form of a short α-helix to dock to the B55α top groove. In this model, FAM122A spatially constrains substrate access by occluding the catalytic subunit with a second α-helix immediately adjacent to helix 1. Consistently, FAM122A functions as a competitive inhibitor as it prevents binding of substrates in in vitro competition assays and the dephosphorylation of CDK substrates by B55α/PP2A in cell lysates. Ablation of FAM122A in human cell lines reduces the rate of proliferation, progression through cell cycle transitions and abrogates G1/S and intra-S phase cell cycle checkpoints. FAM122A-KO in HEK293 cells results in attenuation of CHK1 and CHK2 activation in response to replication stress. Overall, these data strongly suggest that FAM122A is a 'SLiM'-dependent, substrate-competitive inhibitor of B55α/PP2A that suppresses multiple functions of B55α in the DNA damage response and in timely progression through the cell cycle interphase.

10.
Nat Cell Biol ; 25(1): 159-169, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36635501

RESUMO

Oncogenic KRAS mutations occur in approximately 30% of lung adenocarcinoma. Despite several decades of effort, oncogenic KRAS-driven lung cancer remains difficult to treat, and our understanding of the regulators of RAS signalling is incomplete. Here to uncover the impact of diverse KRAS-interacting proteins on lung cancer growth, we combined multiplexed somatic CRISPR/Cas9-based genome editing in genetically engineered mouse models with tumour barcoding and high-throughput barcode sequencing. Through a series of CRISPR/Cas9 screens in autochthonous lung cancer models, we show that HRAS and NRAS are suppressors of KRASG12D-driven tumour growth in vivo and confirm these effects in oncogenic KRAS-driven human lung cancer cell lines. Mechanistically, RAS paralogues interact with oncogenic KRAS, suppress KRAS-KRAS interactions, and reduce downstream ERK signalling. Furthermore, HRAS and NRAS mutations identified in oncogenic KRAS-driven human tumours partially abolished this effect. By comparing the tumour-suppressive effects of HRAS and NRAS in oncogenic KRAS- and oncogenic BRAF-driven lung cancer models, we confirm that RAS paralogues are specific suppressors of KRAS-driven lung cancer in vivo. Our study outlines a technological avenue to uncover positive and negative regulators of oncogenic KRAS-driven cancer in a multiplexed manner in vivo and highlights the role RAS paralogue imbalance in oncogenic KRAS-driven lung cancer.


Assuntos
Neoplasias Pulmonares , Proteínas Proto-Oncogênicas p21(ras) , Camundongos , Animais , Humanos , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Transformação Celular Neoplásica/metabolismo , Transdução de Sinais/genética , Neoplasias Pulmonares/genética , Genes ras , Mutação , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , GTP Fosfo-Hidrolases/genética , GTP Fosfo-Hidrolases/metabolismo
11.
Nucleic Acids Res ; 51(D1): D466-D478, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36300618

RESUMO

Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryo-electron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10-15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).


Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos , Proteínas , Microscopia Crioeletrônica , Cristalografia por Raios X , Proteínas/química , Complexos Multiproteicos/química
13.
Cancer Res ; 82(13): 2485-2498, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35536216

RESUMO

Mutations in RAS isoforms (KRAS, NRAS, and HRAS) are among the most frequent oncogenic alterations in many cancers, making these proteins high priority therapeutic targets. Effectively targeting RAS isoforms requires an exact understanding of their active, inactive, and druggable conformations. However, there is no structural catalog of RAS conformations to guide therapeutic targeting or examining the structural impact of RAS mutations. Here we present an expanded classification of RAS conformations based on analyses of the catalytic switch 1 (SW1) and switch 2 (SW2) loops. From 721 human KRAS, NRAS, and HRAS structures available in the Protein Data Bank (206 RAS-protein cocomplexes, 190 inhibitor-bound, and 325 unbound, including 204 WT and 517 mutated structures), we created a broad conformational classification based on the spatial positions of Y32 in SW1 and Y71 in SW2. Clustering all well-modeled SW1 and SW2 loops using a density-based machine learning algorithm defined additional conformational subsets, some previously undescribed. Three SW1 conformations and nine SW2 conformations were identified, each associated with different nucleotide states (GTP-bound, nucleotide-free, and GDP-bound) and specific bound proteins or inhibitor sites. The GTP-bound SW1 conformation could be further subdivided on the basis of the hydrogen bond type made between Y32 and the GTP γ-phosphate. Further analysis clarified the catalytic impact of G12D and G12V mutations and the inhibitor chemistries that bind to each druggable RAS conformation. Overall, this study has expanded our understanding of RAS structural biology, which could facilitate future RAS drug discovery. SIGNIFICANCE: Analysis of >700 RAS structures helps define an expanded landscape of active, inactive, and druggable RAS conformations, the structural impact of common RAS mutations, and previously uncharacterized RAS inhibitor-binding modes.


Assuntos
Proteínas Proto-Oncogênicas p21(ras) , Proteínas ras , Guanosina Trifosfato/metabolismo , Humanos , Mutação , Conformação Proteica , Isoformas de Proteínas/metabolismo , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Proteínas ras/genética , Proteínas ras/metabolismo
14.
Nucleic Acids Res ; 50(D1): D654-D664, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34643709

RESUMO

The active form of kinases is shared across different family members, as are several commonly observed inactive forms. We previously performed a clustering of the conformation of the activation loop of all protein kinase structures in the Protein Data Bank (PDB) into eight classes based on the dihedral angles that place the Phe side chain of the DFG motif at the N-terminus of the activation loop. Our clusters are strongly associated with the placement of the activation loop, the C-helix, and other structural elements of kinases. We present Kincore, a web resource providing access to our conformational assignments for kinase structures in the PDB. While other available databases provide conformational states or drug type but not both, KinCore includes the conformational state and the inhibitor type (Type 1, 1.5, 2, 3, allosteric) for each kinase chain. The user can query and browse the database using these attributes or determine the conformational labels of a kinase structure using the web server or a standalone program. The database and labeled structure files can be downloaded from the server. Kincore will help in understanding the conformational dynamics of these proteins and guide development of inhibitors targeting specific states. Kincore is available at http://dunbrack.fccc.edu/kincore.


Assuntos
Bases de Dados de Proteínas , Inibidores de Proteínas Quinases/classificação , Proteínas Quinases/classificação , Software , Conformação Proteica , Inibidores de Proteínas Quinases/química , Proteínas Quinases/química
15.
Elife ; 102021 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-34661528

RESUMO

Protein phosphorylation is a reversible post-translation modification essential in cell signaling. This study addresses a long-standing question as to how the most abundant serine/threonine protein phosphatase 2 (PP2A) holoenzyme, PP2A/B55α, specifically recognizes substrates and presents them to the enzyme active site. Here, we show how the PP2A regulatory subunit B55α recruits p107, a pRB-related tumor suppressor and B55α substrate. Using molecular and cellular approaches, we identified a conserved region 1 (R1, residues 615-626) encompassing the strongest p107 binding site. This enabled us to identify an 'HxRVxxV619-625' short linear motif (SLiM) in p107 as necessary for B55α binding and dephosphorylation of the proximal pSer-615 in vitro and in cells. Numerous B55α/PP2A substrates, including TAU, contain a related SLiM C-terminal from a proximal phosphosite, 'p[ST]-P-x(4,10)-[RK]-V-x-x-[VI]-R.' Mutation of conserved SLiM residues in TAU dramatically inhibits dephosphorylation by PP2A/B55α, validating its generality. A data-guided computational model details the interaction of residues from the conserved p107 SLiM, the B55α groove, and phosphosite presentation. Altogether, these data provide key insights into PP2A/B55α's mechanisms of substrate recruitment and active site engagement, and also facilitate identification and validation of new substrates, a key step towards understanding PP2A/B55α's role in multiple cellular processes.


Assuntos
Proteína Fosfatase 2/genética , Proteína p107 Retinoblastoma-Like/genética , Células HEK293 , Holoenzimas/metabolismo , Humanos , Fosforilação , Proteína Fosfatase 2/metabolismo , Proteína p107 Retinoblastoma-Like/metabolismo
16.
PLoS One ; 16(7): e0253411, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34228733

RESUMO

The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.


Assuntos
Sequência de Aminoácidos , Bases de Dados de Proteínas , Animais , Humanos , Internet/organização & administração , Conformação Proteica
17.
Sci Adv ; 7(15)2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33827808

RESUMO

During transcription initiation, the general transcription factor TFIIH marks RNA polymerase II by phosphorylating Ser5 of the carboxyl-terminal domain (CTD) of Rpb1, which is followed by extensive modifications coupled to transcription elongation, mRNA processing, and histone dynamics. We have determined a 3.5-Å resolution cryo-electron microscopy (cryo-EM) structure of the TFIIH kinase module (TFIIK in yeast), which is composed of Kin28, Ccl1, and Tfb3, yeast homologs of CDK7, cyclin H, and MAT1, respectively. The carboxyl-terminal region of Tfb3 was lying at the edge of catalytic cleft of Kin28, where a conserved Tfb3 helix served to stabilize the activation loop in its active conformation. By combining the structure of TFIIK with the previous cryo-EM structure of the preinitiation complex, we extend the previously proposed model of the CTD path to the active site of TFIIK.

18.
Biophysicist (Rockv) ; 2(1): 108-122, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35128343

RESUMO

Biomolecular structure drives function, and computational capabilities have progressed such that the prediction and computational design of biomolecular structures is increasingly feasible. Because computational biophysics attracts students from many different backgrounds and with different levels of resources, teaching the subject can be challenging. One strategy to teach diverse learners is with interactive multimedia material that promotes self-paced, active learning. We have created a hands-on education strategy with a set of sixteen modules that teach topics in biomolecular structure and design, from fundamentals of conformational sampling and energy evaluation to applications like protein docking, antibody design, and RNA structure prediction. Our modules are based on PyRosetta, a Python library that encapsulates all computational modules and methods in the Rosetta software package. The workshop-style modules are implemented as Jupyter Notebooks that can be executed in the Google Colaboratory, allowing learners access with just a web browser. The digital format of Jupyter Notebooks allows us to embed images, molecular visualization movies, and interactive coding exercises. This multimodal approach may better reach students from different disciplines and experience levels as well as attract more researchers from smaller labs and cognate backgrounds to leverage PyRosetta in their science and engineering research. All materials are freely available at https://github.com/RosettaCommons/PyRosetta.notebooks.

19.
MAbs ; 12(1): 1840005, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33180672

RESUMO

Antibody variable domains contain "complementarity-determining regions" (CDRs), the loops that form the antigen binding site. CDRs1-3 are recognized as the canonical CDRs. However, a fourth loop sits adjacent to CDR1 and CDR2 and joins the D and E strands on the antibody v-type fold. This "DE loop" is usually treated as a framework region, even though mutations in the loop affect the conformation of the CDRs and residues in the DE loop occasionally contact antigen. We analyzed the length, structure, and sequence features of all DE loops in the Protein Data Bank (PDB), as well as millions of sequences from HIV-1 infected and naïve patients. We refer to the DE loop as H4 and L4 in the heavy and light chains, respectively. Clustering the backbone conformations of the most common length of L4 (6 residues) reveals four conformations: two κ-only clusters, one λ-only cluster, and one mixed κ/λ cluster. Most H4 loops are length-8 and exist primarily in one conformation; a secondary conformation represents a small fraction of H4-8 structures. H4 sequence variability exceeds that of the antibody framework in naïve human high-throughput sequences, and both L4 and H4 sequence variability from λ and heavy germline sequences exceed that of germline framework regions. Finally, we identified dozens of structures in the PDB with insertions in the DE loop, all related to broadly neutralizing HIV-1 antibodies (bNabs), as well as antibody sequences from high-throughput sequencing studies of HIV-infected individuals, illuminating a possible role in humoral immunity to HIV-1.


Assuntos
Reações Antígeno-Anticorpo/imunologia , Regiões Determinantes de Complementaridade/química , Regiões Determinantes de Complementaridade/imunologia , Modelos Moleculares , Sequência de Aminoácidos , Anticorpos Amplamente Neutralizantes/química , Anticorpos Amplamente Neutralizantes/imunologia , Anticorpos Anti-HIV/química , Anticorpos Anti-HIV/imunologia , Infecções por HIV/imunologia , HIV-1/imunologia , Humanos , Conformação Proteica
20.
PLoS One ; 15(5): e0232528, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32374785

RESUMO

Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.


Assuntos
Redes Neurais de Computação , Estrutura Secundária de Proteína , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Bases de Dados de Proteínas/estatística & dados numéricos , Aprendizado Profundo , Proteínas/química , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA