Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 105
1.
Sci Rep ; 14(1): 12260, 2024 May 28.
Article En | MEDLINE | ID: mdl-38806511

Salmonella enterica is a pathogenic bacterium known for causing severe typhoid fever in humans, making it important to study due to its potential health risks and significant impact on public health. This study provides evolutionary classification of proteins from Salmonella enterica pangenome. We classified 17,238 domains from 13,147 proteins from 79,758 Salmonella enterica strains and studied in detail domains of 272 proteins from 14 characterized Salmonella pathogenicity islands (SPIs). Among SPIs-related proteins, 90 proteins function in the secretion machinery. 41% domains of SPI proteins have no previous sequence annotation. By comparing clinical and environmental isolates, we identified 3682 proteins that are overrepresented in clinical group that we consider as potentially pathogenic. Among domains of potentially pathogenic proteins only 50% domains were annotated by sequence methods previously. Moreover, 36% (1330 out of 3682) of potentially pathogenic proteins cannot be classified into Evolutionary Classification of Protein Domains database (ECOD). Among classified domains of potentially pathogenic proteins the most populated homology groups include helix-turn-helix (HTH), Immunoglobulin-related, and P-loop domains-related. Functional analysis revealed overrepresentation of these protein in biological processes related to viral entry into host cell, antibiotic biosynthesis, DNA metabolism and conformation change, and underrepresentation in translational processes. Analysis of the potentially pathogenic proteins indicates that they form 119 clusters or novel potential pathogenicity islands (NPPIs) within the Salmonella genome, suggesting their potential contribution to the bacterium's virulence. One of the NPPIs revealed significant overrepresentation of potentially pathogenic proteins. Overall, our analysis revealed that identified potentially pathogenic proteins are poorly studied.


Bacterial Proteins , Genome, Bacterial , Genomic Islands , Salmonella enterica , Genomic Islands/genetics , Salmonella enterica/genetics , Salmonella enterica/pathogenicity , Salmonella enterica/classification , Bacterial Proteins/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Humans , Protein Domains
2.
Proteins ; 2024 May 22.
Article En | MEDLINE | ID: mdl-38775337

A propeptide is removed from a precursor protein to generate its active or mature form. Propeptides play essential roles in protein folding, transportation, and activation and are present in about 2.3% of reviewed proteins in the UniProt database. They are often found in secreted or membrane-bound proteins including proteolytic enzymes, hormones, and toxins. We identified a variety of globular and nonglobular Pfam domains in protein sequences designated as propeptides, some of which form intramolecular interactions with other domains in the mature proteins. Propeptide-containing enzymes mostly function as proteases, as they are depleted in other enzyme classes such as hydrolases acting on DNA and RNA, isomerases, and lyases. We applied AlphaFold to generate structural models for over 7000 proteins with propeptides having no less than 20 residues. Analysis of residue contacts in these models revealed conformational changes for over 300 proteins before and after the cleavage of the propeptide. Examples of conformation change occur in several classes of proteolytic enzymes in the families of subtilisins, trypsins, aspartyl proteases, and thermolysin-like metalloproteases. In most of the observed cases, cleavage of the propeptide releases the constraints imposed by the covalent bond between the propeptide and the mature protein, and cleavage enables stronger interactions between the propeptide and the mature protein. These findings suggest that post-cleavage propeptides could play critical roles in regulating the activity of mature proteins.

3.
PLoS Comput Biol ; 20(2): e1011586, 2024 Feb.
Article En | MEDLINE | ID: mdl-38416793

Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.


Biological Evolution , Proteome , Protein Domains , Evolution, Molecular , Bacteria , Databases, Protein
4.
bioRxiv ; 2024 Jan 23.
Article En | MEDLINE | ID: mdl-38328056

During homeostasis, the endoplasmic reticulum (ER) maintains productive transmembrane and secretory protein folding that is vital for proper cellular function. The ER-resident HSP70 chaperone, BiP, plays a pivotal role in sensing ER stress to activate the unfolded protein response (UPR). BiP function is regulated by the bifunctional enzyme FicD that mediates AMPylation and deAMPylation of BiP in response to changes in ER stress. AMPylated BiP acts as a molecular rheostat to regulate UPR signaling, yet little is known about the molecular consequences of FicD loss. In this study, we investigate the role of FicD in mouse embryonic fibroblast (MEF) response to pharmacologically and metabolically induced ER stress. We find differential BiP AMPylation signatures when comparing robust chemical ER stress inducers to physiological glucose starvation stress and recovery. Wildtype MEFs respond to pharmacological ER stress by downregulating BiP AMPylation. Conversely, BiP AMPylation in wildtype MEFs increases upon metabolic stress induced by glucose starvation. Deletion of FicD results in widespread gene expression changes under baseline growth conditions. In addition, FicD null MEFs exhibit dampened UPR signaling, altered cell stress recovery response, and unconstrained protein secretion. Taken together, our findings indicate that FicD is important for tampering UPR signaling, stress recovery, and the maintenance of secretory protein homeostasis. Significance Statement: The chaperone BiP plays a key quality control role in the endoplasmic reticulum, the cellular location for the production, folding, and transport of secreted proteins. The enzyme FicD regulates BiP's activity through AMPylation and deAMPylation. Our study unveils the importance of FicD in regulating BiP and the unfolded protein response (UPR) during stress. We identify distinct BiP AMPylation signatures for different stressors, highlighting FicD's nuanced control. Deletion of FicD causes widespread gene expression changes, disrupts UPR signaling, alters stress recovery, and perturbs protein secretion in cells. These observations underscore the pivotal contribution of FicD for preserving secretory protein homeostasis. Our findings deepen the understanding of FicD's role in maintaining cellular resilience and open avenues for therapeutic strategies targeting UPR-associated diseases.

5.
Proc Natl Acad Sci U S A ; 121(6): e2312291121, 2024 Feb 06.
Article En | MEDLINE | ID: mdl-38294943

A missense variant in patatin-like phospholipase domain-containing protein 3 [PNPLA3(I148M)] is the most impactful genetic risk factor for fatty liver disease (FLD). We previously showed that PNPLA3 is ubiquitylated and subsequently degraded by proteasomes and autophagosomes and that the PNPLA3(148M) variant interferes with this process. To define the machinery responsible for PNPLA3 turnover, we used small interfering (si)RNAs to inactivate components of the ubiquitin proteasome system. Inactivation of bifunctional apoptosis regulator (BFAR), a membrane-bound E3 ubiquitin ligase, reproducibly increased PNPLA3 levels in two lines of cultured hepatocytes. Conversely, overexpression of BFAR decreased levels of endogenous PNPLA3 in HuH7 cells. BFAR and PNPLA3 co-immunoprecipitated when co-expressed in cells. BFAR promoted ubiquitylation of PNPLA3 in vitro in a reconstitution assay using purified, epitope-tagged recombinant proteins. To confirm that BFAR targets PNPLA3, we inactivated Bfar in mice. Levels of PNPLA3 protein were increased twofold in hepatic lipid droplets of Bfar-/- mice with no associated increase in PNPLA3 mRNA levels. Taken together these data are consistent with a model in which BFAR plays a role in the post-translational degradation of PNPLA3. The identification of BFAR provides a potential target to enhance PNPLA3 turnover and prevent FLD.


Non-alcoholic Fatty Liver Disease , Ubiquitin , Mice , Animals , Ubiquitin-Protein Ligases/genetics , Non-alcoholic Fatty Liver Disease/metabolism , Hepatocytes/metabolism , Acyltransferases , Phospholipases A2, Calcium-Independent/genetics
6.
mSystems ; 8(6): e0079623, 2023 Dec 21.
Article En | MEDLINE | ID: mdl-38014954

IMPORTANCE: The pandemic Vpar strain RIMD causes seafood-borne illness worldwide. Previous comparative genomic studies have revealed pathogenicity islands in RIMD that contribute to the success of the strain in infection. However, not all virulence determinants have been identified, and many of the proteins encoded in known pathogenicity islands are of unknown function. Based on the EOCD database, we used evolution-based classification of structure models for the RIMD proteome to improve our functional understanding of virulence determinants acquired by the pandemic strain. We further identify and classify previously unknown mobile protein domains as well as fast evolving residue positions in structure models that contribute to virulence and adaptation with respect to a pre-pandemic strain. Our work highlights key contributions of phage in mediating seafood born illness, suggesting this strain balances its avoidance of phage predators with its successful colonization of human hosts.


Vibrio parahaemolyticus , Humans , Virulence/genetics , Vibrio parahaemolyticus/genetics , Virulence Factors/genetics , Genomics
7.
Proc Natl Acad Sci U S A ; 120(12): e2214069120, 2023 03 21.
Article En | MEDLINE | ID: mdl-36917664

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).


Amino Acids , Proteome , Humans , Proteome/genetics , Sequence Alignment , Databases, Protein
8.
mBio ; 13(4): e0162922, 2022 08 30.
Article En | MEDLINE | ID: mdl-35862776

Vibrio parahaemolyticus is among the leading causes of bacterial seafood-borne acute gastroenteritis. Like many intracellular pathogens, V. parahaemolyticus invades host cells during infection by deamidating host small Rho GTPases. The Rho GTPase deamidating activity of VopC, a type 3 secretion system (T3SS) translocated effector, drives V. parahaemolyticus invasion. The intracellular pathogen uropathogenic Escherichia coli (UPEC) invades host cells by secreting a VopC homolog, the secreted toxin cytotoxic necrotizing factor 1 (CNF1). Because of the homology between VopC and CNF1, we hypothesized that topical application of CNF1 during V. parahaemolyticus infection could supplement VopC activity. Here, we demonstrate that CNF1 improves the efficiency of V. parahaemolyticus invasion, a bottleneck in V. parahaemolyticus infection, across a range of doses. CNF1 increases V. parahaemolyticus invasion independent of both VopC and the T3SS altogether but leaves a disproportionate fraction of intracellular bacteria unable to escape the endosome and complete their infection cycle. This phenomenon holds true in the presence or absence of VopC but is particularly pronounced in the absence of a T3SS. The native VopC, by contrast, promotes a far less efficient invasion but permits the majority of internalized bacteria to escape the endosome and complete their infection cycle. These studies highlight the significance of enzymatic specificity during infection, as virulence factors (VopC and CNF1 in this instance) with similarities in function (bacterial uptake), catalytic activity (deamidation), and substrates (Rho GTPases) are not sufficiently interchangeable for mediating a successful invasion for neighboring bacterial pathogens. IMPORTANCE Many species of intracellular bacterial pathogens target host small Rho GTPases to initiate invasion, including the human pathogens Vibrio parahaemolyticus and uropathogenic Escherichia coli (UPEC). The type three secretion system (T3SS) effector VopC of V. parahaemolyticus promotes invasion through the deamidation of Rac1 and CDC42 in the host, whereas the secreted toxin cytotoxic necrotizing factor 1 (CNF1) drives UPEC's internalization through the deamidation of Rac1, CDC42, and RhoA. Despite these similarities in the catalytic activity of CNF1 and VopC, we observed that the two enzymes were not interchangeable. Although CNF1 increased V. parahaemolyticus endosomal invasion, most intracellular V. parahaemolyticus aborted their infection cycle and remained trapped in endosomes. Our findings illuminate how the precise biochemical fine-tuning of T3SS effectors is essential for efficacious pathogenesis. Moreover, they pave the way for future investigations into the biochemical mechanisms underpinning V. parahaemolyticus endosomal escape and, more broadly, the regulation of successful pathogenesis.


Bacterial Infections , Escherichia coli Proteins , Uropathogenic Escherichia coli , Vibrio parahaemolyticus , Humans , Type III Secretion Systems/metabolism , Uropathogenic Escherichia coli/metabolism , Vibrio parahaemolyticus/genetics , Vibrio parahaemolyticus/metabolism , Virulence Factors , rho GTP-Binding Proteins
9.
Proc Natl Acad Sci U S A ; 119(24): e2203176119, 2022 06 14.
Article En | MEDLINE | ID: mdl-35648808

Bacterial signal transduction systems sense changes in the environment and transmit these signals to control cellular responses. The simplest one-component signal transduction systems include an input sensor domain and an output response domain encoded in a single protein chain. Alternatively, two-component signal transduction systems transmit signals by phosphorelay between input and output domains from separate proteins. The membrane-tethered periplasmic bile acid sensor that activates the Vibrio parahaemolyticus type III secretion system adopts an obligate heterodimer of two proteins encoded by partially overlapping VtrA and VtrC genes. This co-component signal transduction system binds bile acid using a lipocalin-like domain in VtrC and transmits the signal through the membrane to a cytoplasmic DNA-binding transcription factor in VtrA. Using the domain and operon organization of VtrA/VtrC, we identify a fast-evolving superfamily of co-component systems in enteric bacteria. Accurate machine learning­based fold predictions for the candidate co-components support their homology in the twilight zone of rapidly evolving sequences and provide mechanistic hypotheses about previously unrecognized lipid-sensing functions.


Bacterial Proteins , Gene Expression Regulation, Bacterial , Genomic Islands , Membrane Proteins , Type III Secretion Systems , Vibrio parahaemolyticus , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Bile Acids and Salts/metabolism , DNA-Binding Proteins/metabolism , Membrane Proteins/genetics , Membrane Proteins/metabolism , Protein Multimerization , Signal Transduction , Transcription Factors/metabolism , Type III Secretion Systems/genetics , Vibrio parahaemolyticus/genetics , Vibrio parahaemolyticus/pathogenicity , Virulence/genetics
10.
Proteins ; 89(12): 1618-1632, 2021 12.
Article En | MEDLINE | ID: mdl-34350630

An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).


Models, Molecular , Protein Conformation , Proteins , Amino Acid Sequence , Computational Biology , Evolution, Molecular , Proteins/chemistry , Proteins/genetics , Sequence Analysis, Protein , Software
11.
Proteins ; 89(12): 1673-1686, 2021 12.
Article En | MEDLINE | ID: mdl-34240477

This report describes the tertiary structure prediction assessment of difficult modeling targets in the 14th round of the Critical Assessment of Structure Prediction (CASP14). We implemented an official ranking scheme that used the same scores as the previous CASP topology-based assessment, but combined these scores with one that emphasized physically realistic models. The top performing AlphaFold2 group outperformed the rest of the prediction community on all but two of the difficult targets considered in this assessment. They provided high quality models for most of the targets (86% over GDT_TS 70), including larger targets above 150 residues, and they correctly predicted the topology of almost all the rest. AlphaFold2 performance was followed by two manual Baker methods, a Feig method that refined Zhang-server models, two notable automated Zhang server methods (QUARK and Zhang-server), and a Zhang manual group. Despite the remarkable progress in protein structure prediction of difficult targets, both the prediction community and AlphaFold2, to a lesser extent, faced challenges with flexible regions and obligate oligomeric assemblies. The official ranking of top-performing methods was supported by performance generated PCA and heatmap clusters that gave insight into target difficulties and the most successful state-of-the-art structure prediction methodologies.


Computational Biology/methods , Models, Molecular , Protein Conformation , Protein Folding , Software , Databases, Protein , Proteins/chemistry , Proteins/metabolism , Sequence Analysis, Protein
12.
Science ; 373(6557): 871-876, 2021 08 20.
Article En | MEDLINE | ID: mdl-34282049

DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.


Deep Learning , Protein Conformation , Protein Folding , Proteins/chemistry , ADAM Proteins/chemistry , Amino Acid Sequence , Computer Simulation , Cryoelectron Microscopy , Crystallography, X-Ray , Databases, Protein , Membrane Proteins/chemistry , Models, Molecular , Multiprotein Complexes/chemistry , Neural Networks, Computer , Protein Subunits/chemistry , Proteins/physiology , Receptors, G-Protein-Coupled/chemistry , Sphingosine N-Acyltransferase/chemistry
13.
ACS Omega ; 6(24): 15698-15707, 2021 Jun 22.
Article En | MEDLINE | ID: mdl-34179613

Domain classifications are a useful resource for computational analysis of the protein structure, but elements of their composition are often opaque to potential users. We perform a comparative analysis of our classification ECOD against the SCOPe, SCOP2, and CATH domain classifications with respect to their constituent domain boundaries and hierarchal organization. The coverage of these domain classifications with respect to ECOD and to the PDB was assessed by structure and by sequence. We also conducted domain pair analysis to determine broad differences in hierarchy between domains shared by ECOD and other classifications. Finally, we present domains from the major facilitator superfamily (MFS) of transporter proteins and provide evidence that supports their split into domains and for multiple conformations within these families. We find that the ECOD and CATH provide the most extensive structural coverage of the PDB. ECOD and SCOPe have the most consistent domain boundary conditions, whereas CATH and SCOP2 both differ significantly.

14.
Sci Rep ; 11(1): 7996, 2021 04 12.
Article En | MEDLINE | ID: mdl-33846496

Bumble bees exhibit exceptional diversity in their segmental body coloration largely as a result of mimicry. In this study we sought to discover genes involved in this variation through studying a lab-generated mutant in bumble bee Bombus terrestris, in which the typical black coloration of the pleuron, scutellum, and first metasomal tergite is replaced by yellow, a color variant also found in sister lineages to B. terrestris. Utilizing a combination of RAD-Seq and whole-genome re-sequencing, we localized the color-generating variant to a single SNP in the protein-coding sequence of transcription factor cut. This mutation generates an amino acid change that modifies the conformation of a coiled-coil structure outside DNA-binding domains. We found that all sequenced Hymenoptera, including sister lineages, possess the non-mutant allele, indicating different mechanisms are involved in the same color transition in nature. Cut is important for multiple facets of development, yet this mutation generated no noticeable external phenotypic effects outside of setal characteristics. Reproductive capacity was reduced, however, as queens were less likely to mate and produce female offspring, exhibiting behavior similar to that of workers. Our research implicates a novel developmental player in pigmentation, and potentially caste, thus contributing to a better understanding of the evolution of diversity in both of these processes.


Bees/genetics , Genome, Insect , High-Throughput Nucleotide Sequencing , Mutation/genetics , Pigmentation/genetics , Whole Genome Sequencing , Amino Acid Sequence , Animals , Conserved Sequence/genetics , Genes, Insect , Genome-Wide Association Study , Insect Proteins/chemistry , Insect Proteins/genetics , Male , Phenotype , Polymorphism, Single Nucleotide/genetics , Protein Domains
15.
mSystems ; 6(1)2021 Feb 09.
Article En | MEDLINE | ID: mdl-33563785

Diverse bacterial pathogens employ effector delivery systems to disrupt vital cellular processes in the host (N. M. Alto and K. Orth, Cold Spring Harbor Perspect Biol 4:a006114, 2012, https://doi.org/10.1101/cshperspect.a006114). The type III secretion system 1 of the marine pathogen Vibrio parahaemolyticus utilizes the sequential action of four effectors to induce a rapid, proinflammatory cell death uniquely characterized by a prosurvival host transcriptional response (D. L. Burdette, M. L. Yarbrough, A Orvedahl, C. J. Gilpin, and K. Orth, Proc Natl Acad Sci USA 105:12497-12502, 2008, https://doi.org/10.1073/pnas.0802773105; N. J. De Nisco, M. Kanchwala, P. Li, J. Fernandez, C. Xing, and K. Orth, Sci Signal 10:eaa14501, 2017, https://doi.org/10.1126/scisignal.aal4501). Herein, we show that this prosurvival response is caused by the action of the channel-forming effector VopQ that targets the host V-ATPase, resulting in lysosomal deacidification and inhibition of lysosome-autophagosome fusion. Recent structural studies have shown how VopQ interacts with the V-ATPase and, while in the ER, a V-ATPase assembly intermediate can interact with VopQ, causing a disruption in membrane integrity. Additionally, we observed that VopQ-mediated disruption of the V-ATPase activates the IRE1 branch of the unfolded protein response (UPR), resulting in an IRE1-dependent activation of ERK1/2 MAPK signaling. We also find that this early VopQ-dependent induction of ERK1/2 phosphorylation is terminated by the VopS-mediated inhibitory AMPylation of Rho GTPase signaling. Since VopS dampens VopQ-induced IRE1-dependent ERK1/2 activation, we propose that IRE1 activates ERK1/2 phosphorylation at or above the level of Rho GTPases. This study illustrates how temporally induced effectors can work as in tandem as agonist/antagonist to manipulate host signaling and reveals new connections between V-ATPase function, UPR, and MAPK signaling.IMPORTANCE Vibrio parahaemolyticus is a seafood-borne pathogen that encodes two type 3 secretion systems (T3SS). The first system, T3SS1, is thought to be maintained in all strains of V. parahaemolyticus to maintain survival in the environment, whereas the second system, T3SS2, is linked to clinical isolates and disease in humans. Here, we found that first system targets evolutionarily conserved signaling systems to manipulate host cells, eventually causing a rapid, orchestrated cells death within 3 h. We have found that the T3SS1 injects virulence factors that temporally manipulate host signaling. Within the first hour of infection, the effector VopQ acts first by activating host survival signals while diminishing the host cell apoptotic machinery. Less than an hour later, another effector, VopS, reverses activation and inhibition of these signaling systems, ultimately leading to death of the host cell. This work provides example of how pathogens have evolved to manipulate the interplay between T3SS effectors to regulate host signaling pathways.

16.
J Mol Biol ; 433(4): 166788, 2021 02 19.
Article En | MEDLINE | ID: mdl-33387532

The Rossmann-like fold is the most prevalent and diversified doubly-wound superfold of ancient evolutionary origin. Rossmann-like domains are present in a variety of metabolic enzymes and are capable of binding diverse ligands. Discerning evolutionary relationships among these domains is challenging because of their diverse functions and ancient origin. We defined a minimal Rossmann-like structural motif (RLM), identified RLM-containing domains among known 3D structures (20%) and classified them according to their homologous relationships. New classifications were incorporated into our Evolutionary Classification of protein Domains (ECOD) database. We defined 156 homology groups (H-groups), which were further clustered into 123 possible homology groups (X-groups). Our analysis revealed that RLM-containing proteins constitute approximately 15% of the human proteome. We found that disease-causing mutations are more frequent within RLM domains than within non-RLM domains of these proteins, highlighting the importance of RLM-containing proteins for human health.


Amino Acid Motifs , Models, Molecular , Protein Conformation , Proteins/chemistry , Binding Sites , Biological Evolution , Databases, Protein , Humans , Molecular Dynamics Simulation , Mutation , Protein Binding , Protein Domains , Protein Folding , Protein Interaction Domains and Motifs , Protein Multimerization , Proteins/genetics , Proteins/metabolism
17.
Mol Biol Evol ; 38(5): 2166-2176, 2021 05 04.
Article En | MEDLINE | ID: mdl-33502509

Centuries of zoological studies have amassed billions of specimens in collections worldwide. Genomics of these specimens promises to reinvigorate biodiversity research. However, because DNA degrades with age in historical specimens, it is a challenge to obtain genomic data for them and analyze degraded genomes. We developed experimental and computational protocols to overcome these challenges and applied our methods to resolve a series of long-standing controversies involving a group of butterflies. We deduced the geographical origins of several historical specimens of uncertain provenance that are at the heart of these debates. Here, genomics tackles one of the greatest problems in zoology: countless old specimens that serve as irreplaceable embodiments of species concepts cannot be confidently assigned to extant species or population due to the lack of diagnostic morphological features and clear documentation of the collection locality. The ability to determine where they were collected will resolve many on-going disputes. More broadly, we show the utility of applying genomics to historical museum specimens to delineate the boundaries of species and populations, and to hypothesize about genotypic determinants of phenotypic traits.


Butterflies/genetics , DNA, Ancient/analysis , Genomics/methods , Adaptation, Biological/genetics , Altitude , Animals , Pigmentation/genetics
18.
Nat Struct Mol Biol ; 27(12): 1194-1201, 2020 12.
Article En | MEDLINE | ID: mdl-33106659

De novo formation of the double-membrane compartment autophagosome is seeded by small vesicles carrying membrane protein autophagy-related 9 (ATG9), the function of which remains unknown. Here we find that ATG9A scrambles phospholipids of membranes in vitro. Cryo-EM structures of human ATG9A reveal a trimer with a solvated central pore, which is connected laterally to the cytosol through the cavity within each protomer. Similarities to ABC exporters suggest that ATG9A could be a transporter that uses the central pore to function. Moreover, molecular dynamics simulation suggests that the central pore opens laterally to accommodate lipid headgroups, thereby enabling lipids to flip. Mutations in the pore reduce scrambling activity and yield markedly smaller autophagosomes, indicating that lipid scrambling by ATG9A is essential for membrane expansion. We propose ATG9A acts as a membrane-embedded funnel to facilitate lipid flipping and to redistribute lipids added to the outer leaflet of ATG9 vesicles, thereby enabling growth into autophagosomes.


Autophagosomes/chemistry , Autophagy-Related Proteins/chemistry , Membrane Proteins/chemistry , Phospholipids/chemistry , Proteolipids/chemistry , Vesicular Transport Proteins/chemistry , Animals , Autophagosomes/metabolism , Autophagy-Related Proteins/genetics , Autophagy-Related Proteins/metabolism , Binding Sites , Biological Transport , Cell Line , Cryoelectron Microscopy , Fibroblasts/metabolism , Fibroblasts/ultrastructure , Gene Expression , Green Fluorescent Proteins/chemistry , Green Fluorescent Proteins/genetics , Green Fluorescent Proteins/metabolism , HEK293 Cells , HeLa Cells , Humans , Lipid Bilayers/chemistry , Lipid Bilayers/metabolism , Luminescent Proteins/genetics , Luminescent Proteins/metabolism , Membrane Proteins/genetics , Membrane Proteins/metabolism , Mice , Molecular Dynamics Simulation , Phospholipids/metabolism , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Protein Multimerization , Proteolipids/metabolism , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism , Vesicular Transport Proteins/genetics , Vesicular Transport Proteins/metabolism , Red Fluorescent Protein
19.
Proteins ; 88(11): 1513-1527, 2020 11.
Article En | MEDLINE | ID: mdl-32543729

Protein domains exist by themselves or in combination with other domains to form complex multidomain proteins. Defining domain boundaries in proteins is essential for understanding their evolution and function but is not trivial. More specifically, partitioning domains that interact by forming a single ß-sheet is known to be particularly troublesome for automatic structure-based domain decomposition pipelines. Here, we study edge-to-edge ß-strand interactions between domains in a protein chain, to help define the boundaries for some more difficult cases where a single ß-sheet spanning over two domains gives an appearance of one. We give a number of examples where ß-strands belonging to a single ß-sheet do not belong to a single domain and highlight the difficulties of automatic domain parsers on these examples. This work can be used as a baseline for defining domain boundaries in homologous proteins or proteins with similar domain interactions in the future.


Amino Acid Isomerases/chemistry , Penicillin-Binding Proteins/chemistry , Protein Interaction Domains and Motifs , Racemases and Epimerases/chemistry , Amino Acid Isomerases/metabolism , Amino Acid Sequence , Animals , Bacteria/chemistry , Binding Sites , Databases, Protein , Datasets as Topic , Humans , Models, Molecular , Penicillin-Binding Proteins/metabolism , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Racemases and Epimerases/metabolism , Thermodynamics
20.
PLoS Comput Biol ; 16(5): e1007775, 2020 05.
Article En | MEDLINE | ID: mdl-32413045

The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.


Disease/genetics , Forecasting/methods , Genome, Human/genetics , Alleles , Amino Acid Sequence/genetics , Computational Biology/methods , Deep Learning , Gene Regulatory Networks/genetics , Humans , Mutation/genetics , Mutation, Missense/genetics , Nerve Net , Open Reading Frames/genetics , Sequence Analysis/methods , Exome Sequencing/methods
...