RESUMO
Uncovering the regulators of cellular aging will unravel the complexity of aging biology and identify potential therapeutic interventions to delay the onset and progress of chronic, aging-related diseases. In this work, we systematically compared genesets involved in regulating the lifespan of Saccharomyces cerevisiae (a powerful model organism to study the cellular aging of humans) and those with expression changes under rapamycin treatment. Among the functionally uncharacterized genes in the overlap set, YBR238C stood out as the only one downregulated by rapamycin and with an increased chronological and replicative lifespan upon deletion. We show that YBR238C and its paralog RMD9 oppositely affect mitochondria and aging. YBR238C deletion increases the cellular lifespan by enhancing mitochondrial function. Its overexpression accelerates cellular aging via mitochondrial dysfunction. We find that the phenotypic effect of YBR238C is largely explained by HAP4- and RMD9-dependent mechanisms. Furthermore, we find that genetic- or chemical-based induction of mitochondrial dysfunction increases TORC1 (Target of Rapamycin Complex 1) activity that, subsequently, accelerates cellular aging. Notably, TORC1 inhibition by rapamycin (or deletion of YBR238C) improves the shortened lifespan under these mitochondrial dysfunction conditions in yeast and human cells. The growth of mutant cells (a proxy of TORC1 activity) with enhanced mitochondrial function is sensitive to rapamycin whereas the growth of defective mitochondrial mutants is largely resistant to rapamycin compared to wild type. Our findings demonstrate a feedback loop between TORC1 and mitochondria (the TORC1-MItochondria-TORC1 (TOMITO) signaling process) that regulates cellular aging processes. Hereby, YBR238C is an effector of TORC1 modulating mitochondrial function.
Assuntos
Senescência Celular , Mitocôndrias , Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Transdução de Sinais , Deleção de Genes , Regulação Fúngica da Expressão Gênica , Alvo Mecanístico do Complexo 1 de Rapamicina/metabolismo , Alvo Mecanístico do Complexo 1 de Rapamicina/genética , Mitocôndrias/metabolismo , Mitocôndrias/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Sirolimo/farmacologia , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genéticaRESUMO
CD24 is frequently overexpressed in ovarian cancer and promotes immune evasion by interacting with its receptor Siglec10, present on tumor-associated macrophages, providing a "don't eat me" signal that prevents targeting and phagocytosis by macrophages. Factors promoting CD24 expression could represent novel immunotherapeutic targets for ovarian cancer. Here, using a genome-wide CRISPR knockout screen, we identify GPAA1 (glycosylphosphatidylinositol anchor attachment 1), a factor that catalyzes the attachment of a glycosylphosphatidylinositol (GPI) lipid anchor to substrate proteins, as a positive regulator of CD24 cell surface expression. Genetic ablation of GPAA1 abolishes CD24 cell surface expression, enhances macrophage-mediated phagocytosis, and inhibits ovarian tumor growth in mice. GPAA1 shares structural similarities with aminopeptidases. Consequently, we show that bestatin, a clinically advanced aminopeptidase inhibitor, binds to GPAA1 and blocks GPI attachment, resulting in reduced CD24 cell surface expression, increased macrophage-mediated phagocytosis, and suppressed growth of ovarian tumors. Our study highlights the potential of targeting GPAA1 as an immunotherapeutic approach for CD24+ ovarian cancers.
Assuntos
Aciltransferases , Antígeno CD24 , Neoplasias Ovarianas , Fagocitose , Animais , Feminino , Humanos , Camundongos , Aciltransferases/metabolismo , Amidoidrolases/metabolismo , Amidoidrolases/genética , Antígeno CD24/metabolismo , Linhagem Celular Tumoral , Glicosilfosfatidilinositóis/metabolismo , Macrófagos/metabolismo , Macrófagos/imunologia , Neoplasias Ovarianas/imunologia , Neoplasias Ovarianas/metabolismo , Neoplasias Ovarianas/patologia , Neoplasias Ovarianas/terapiaRESUMO
There was an error in the original publication [...].
RESUMO
BACKGROUND: Heart failure (HF) and diabetes are associated with increased incidence and worse prognosis of each other. The prognostic value of global longitudinal strain (GLS) measured by cardiovascular magnetic resonance (CMR) has not been established in HF patients with diabetes. METHODS: In this prospective, observational study, consecutive patients (n = 315) with HF underwent CMR at 3T, including GLS, late gadolinium enhancement (LGE), native T1, and extracellular volume fraction (ECV) mapping. Plasma biomarker concentrations were measured including: N-terminal pro B-type natriuretic peptide(NT-proBNP), high-sensitivity troponin T(hs-TnT), growth differentiation factor 15(GDF-15), soluble ST2(sST2), and galectin 3(Gal-3). The primary outcome was a composite of all-cause mortality or HF hospitalisation. RESULTS: Compared to those without diabetes (n = 156), the diabetes group (n = 159) had a higher LGE prevalence (76 vs. 60%, p < 0.05), higher T1 (1285±42 vs. 1269±42ms, p < 0.001), and higher ECV (30.5±3.5 vs. 28.8±4.1%, p < 0.001). The diabetes group had higher NT-pro-BNP, hs-TnT, GDF-15, sST2, and Gal-3. Diabetes conferred worse prognosis (hazard ratio (HR) 2.33 [95% confidence interval (CI) 1.43-3.79], p < 0.001). In multivariable Cox regression analysis including clinical markers and plasma biomarkers, sST2 alone remained independently associated with the primary outcome (HR per 1 ng/mL 1.04 [95% CI 1.02-1.07], p = 0.001). In multivariable Cox regression models in the diabetes group, both GLS and sST2 remained prognostic (GLS: HR 1.12 [95% CI 1.03-1.21], p = 0.01; sST2: HR per 1 ng/mL 1.03 [95% CI 1.00-1.06], p = 0.02). CONCLUSIONS: Compared to HF patients without diabetes, those with diabetes have worse plasma and CMR markers of fibrosis and a more adverse prognosis. GLS by CMR is a powerful and independent prognostic marker in HF patients with diabetes.
Assuntos
Diabetes Mellitus , Insuficiência Cardíaca , Humanos , Fator 15 de Diferenciação de Crescimento , Deformação Longitudinal Global , Meios de Contraste , Estudos Prospectivos , Gadolínio , Biomarcadores , Prognóstico , Insuficiência Cardíaca/diagnóstico , Diabetes Mellitus/diagnósticoRESUMO
Gerontology research on anti-aging interventions with drugs could be an answer to age-related diseases, aiming at closing the gap between lifespan and healthspan. Here, we present two methods for assaying chronological lifespan in human cells: (1) a version of the classical outgrowth assay with quantitative assessment of surviving cells and (2) a version of the PICLS method (propidium iodide fluorescent-based measurement of cell death). Both methods are fast, simple to conduct, cost-effective, produce quantitative data for further analysis and can be used with diverse human cell lines. Whereas the first method is ideal for validation and testing the post-intervention reproductive potential of surviving cells, the second method has true high-throughput screening potential. The new technologies were validated with known anti-aging compounds (2,5-anhydro-D-mannitol and rapamycin). Using the high-throughput screening method, we screened a library of 162 chemical entities and identified three compounds that extend the longevity of human cells.
Assuntos
Ensaios de Triagem em Larga Escala , Longevidade , Humanos , Linhagem Celular , Manitol , ReproduçãoRESUMO
The dense alignment surface (DAS) transmembrane (TM) prediction method was first published more than 25 years ago. DAS was the one of the earliest tools to discriminate TM proteins from globular ones and to predict the sequence positions of TM helices in proteins with high accuracy from their amino acid sequence alone. The algorithmic improvements that followed in 2002 (DAS-TMfilter) made it one of the best performing tools among those relying on local sequence information for TM prediction. Since then, many more experimental data about membrane proteins (including thousands of 3D structures of membrane proteins) have accumulated but there has been no significant improvement concerning performance in the area of TM helix prediction tools. Here, we report a new implementation of the DAS-TMfilter prediction web server. We reevaluated the performance of the method using a five-times-larger, updated test dataset. We found that the method performs at essentially the same accuracy as the original even without any change to the parametrization of the program despite the much larger dataset. Thus, the approach captures the physico-chemistry of TM helices well, essentially solving this scientific problem.
Assuntos
Algoritmos , Proteínas de Membrana , Estrutura Secundária de Proteína , Proteínas de Membrana/química , Sequência de AminoácidosRESUMO
BACKGROUND: Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS: The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS: Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Assuntos
Genômica , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Sequência de Bases , FenótipoRESUMO
Chronic metabolic diseases arise from changes in metabolic fluxes through biomolecular pathways and gene networks accumulated over the lifetime of an individual. While clinical and biochemical profiles present just real-time snapshots of the patients' health, efficient computation models of the pathological disturbance of biomolecular processes are required to achieve individualized mechanistic insights into disease progression. Here, we describe the Generalized metabolic flux analysis (GMFA) for addressing this gap. Suitably grouping individual metabolites/fluxes into pools simplifies the analysis of the resulting more coarse-grain network. We also map non-metabolic clinical modalities onto the network with additional edges. Instead of using the time coordinate, the system status (metabolite concentrations and fluxes) is quantified as function of a generalized extent variable (a coordinate in the space of generalized metabolites) that represents the system's coordinate along its evolution path and evaluates the degree of change between any two states on that path. We applied GMFA to analyze Type 2 Diabetes Mellitus (T2DM) patients from two cohorts: EVAS (289 patients from Singapore) and NHANES (517) from the USA. Personalized systems biology models (digital twins) were constructed. We deduced disease dynamics from the individually parameterized metabolic network and predicted the evolution path of the metabolic health state. For each patient, we obtained an individual description of disease dynamics and predict an evolution path of the metabolic health state. Our predictive models achieve an ROC-AUC in the range 0.79-0.95 (sensitivity 80-92%, specificity 62-94%) in identifying phenotypes at the baseline and predicting future development of diabetic retinopathy and cataract progression among T2DM patients within 3 years from the baseline. The GMFA method is a step towards realizing the ultimate goal to develop practical predictive computational models for diagnostics based on systems biology. This tool has potential use in chronic disease management in medical practice. Supplementary Information: The online version contains supplementary material available at 10.1007/s13755-023-00218-x.
RESUMO
BACKGROUND: Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. RESULTS: The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. CONCLUSION: If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
Assuntos
Escherichia coli , Iluminação , Escherichia coli/genética , GenômicaRESUMO
Although aging is the biggest risk factor for human chronic (cancer, diabetic, cardiovascular, and neurodegenerative) diseases, few interventions are known besides caloric restriction and a small number of drugs (with substantial side effects) that directly address aging. Thus, there is an urgent need for new options that can generally delay aging processes and prevent age-related diseases. Cellular aging is at the basis of aging processes. Chronological lifespan (CLS) of yeast Saccharomyces cerevisiae is the well-established model system for investigating the interventions of human post-mitotic cellular aging. CLS is defined as the number of days cells remain viable in a stationary phase. We developed a new, cheap, and fast quantitative method for measuring CLS in cell cultures incubated together with various chemical agents and controls on 96-well plates. Our PICLS protocol with (1) the use of propidium iodide for fluorescent-based cell survival reading in a microplate reader and (2) total cell count measurement via OD600nm absorption from the same plate provides real high-throughput capacity. Depending on logistics, large numbers of plates can be processed in parallel so that the screening of thousands of compounds becomes feasible in a short time. The method was validated by measuring the effect of rapamycin and calorie restriction on yeast CLS. We utilized this approach for chemical agent screening. We discovered the anti-aging/geroprotective potential of 2,5-anhydro-D-mannitol (2,5-AM) and suggest its usage individually or in combination with other anti-aging interventions.
Assuntos
Ensaios de Triagem em Larga Escala , Saccharomyces cerevisiae , Humanos , Manitol/farmacologia , Envelhecimento , Senescência CelularRESUMO
BACKGROUND: Escherichia coli (E. coli) has been one of the most studied model organisms in the history of life sciences. Initially thought just to be commensal bacteria, E. coli has shown wide phenotypic diversity including pathogenic isolates with great relevance to public health. Though pangenome analysis has been attempted several times, there is no systematic functional characterization of the E. coli subgroups according to the gene profile. RESULTS: Systematically scanning for optimal parametrization, we have built the E. coli pangenome from 1324 complete genomes. The pangenome size is estimated to be ~25,000 gene families (GFs). Whereas the core genome diminishes as more genomes are added, the softcore genome (≥95% of strains) is stable with ~3000 GFs regardless of the total number of genomes. Apparently, the softcore genome (with a 92% or 95% generation threshold) can define the genome of a bacterial species listing the critically relevant, evolutionarily most conserved or important classes of GFs. Unsupervised clustering of common E. coli sequence types using the presence/absence GF matrix reveals distinct characteristics of E. coli phylogroups B1, B2, and E. We highlight the bi-lineage nature of B1, the variation of the secretion and of the iron acquisition systems in ST11 (E), and the incorporation of a highly conserved prophage into the genome of ST131 (B2). The tail structure of the prophage is evolutionarily related to R2-pyocin (a tailocin) from Pseudomonas aeruginosa PAO1. We hypothesize that this molecular machinery is highly likely to play an important role in protecting its own colonies; thus, contributing towards the rapid rise of pandemic E. coli ST131. CONCLUSIONS: This study has explored the optimized pangenome development in E. coli. We provide complete GF lists and the pangenome matrix as supplementary data for further studies. We identified biological characteristics of different E. coli subtypes, specifically for phylogroups B1, B2, and E. We found an operon-like genome region coding for a tailocin specific for ST131 strains. The latter is a potential killer weapon providing pandemic E. coli ST131 with an advantage in inter-bacterial competition and, suggestively, explains their dominance as human pathogen among E. coli strains.
Assuntos
Infecções por Escherichia coli , Proteínas de Escherichia coli , Escherichia coli/genética , Escherichia coli/metabolismo , Infecções por Escherichia coli/epidemiologia , Infecções por Escherichia coli/microbiologia , Proteínas de Escherichia coli/genética , Genoma Bacteriano , Humanos , Pandemias , Filogenia , PrófagosRESUMO
The paradigm shift associated with the introduction of the pan-genome concept has drawn the attention from singular reference genomes toward the actual sequence diversity within organism populations, strain collections, clades, etc. A single genome is no longer sufficient to describe bacteria of interest, but instead, the genomic repertoire of all existing strains is the key to the metabolic, evolutionary, or pathogenic potential of a species. The classification of orthologous genes derived from a collection of taxonomically related genome sequences is central to bacterial pan-genome computational analysis. In this work, we present a review of methods for computing pan-genome gene clusters including their comparative analysis for the case of Streptococcus pyogenes strain genomes. We exhaustively scanned the parametrization space of the homologue searching procedures and find optimal parameters (sequence identity (60%) and coverage (50-60%) in the pairwise alignment) for the orthologous clustering of gene sequences. We find that the sequence identity threshold influences the number of gene families ~3 times stronger than the sequence coverage threshold.
Assuntos
Genoma Bacteriano , Streptococcus pyogenes , Análise por Conglomerados , Genômica/métodos , Família Multigênica , Filogenia , Streptococcus pyogenes/genéticaRESUMO
Aging is the greatest challenge to humankind worldwide. Aging is associated with a progressive loss of physiological integrity due to a decline in cellular metabolism and functions. Such metabolic changes lead to age-related diseases, thereby compromising human health for the remaining life. Thus, there is an urgent need to identify geroprotectors that regulate metabolic functions to target the aging biological processes. Nutrients are the major regulator of metabolic activities to coordinate cell growth and development. Iron is an important nutrient involved in several biological functions, including metabolism. In this study using yeast as an aging model organism, we show that iron supplementation delays aging and increases the cellular lifespan. To determine how iron supplementation increases lifespan, we performed a gene expression analysis of mitochondria, the main cellular hub of iron utilization. Quantitative analysis of gene expression data reveals that iron supplementation upregulates the expression of the mitochondrial tricarboxylic acid (TCA) cycle and electron transport chain (ETC) genes. Furthermore, in agreement with the expression profiles of mitochondrial genes, ATP level is elevated by iron supplementation, which is required for increasing the cellular lifespan. To confirm, we tested the role of iron supplementation in the AMPK knockout mutant. AMPK is a highly conserved controller of mitochondrial metabolism and energy homeostasis. Remarkably, iron supplementation rescued the short lifespan of the AMPK knockout mutant and confirmed its anti-aging role through the enhancement of mitochondrial functions. Thus, our results suggest a potential therapeutic use of iron supplementation to delay aging and prolong healthspan.
Assuntos
Ferro , Longevidade , Proteínas Quinases Ativadas por AMP/metabolismo , Envelhecimento/metabolismo , Suplementos Nutricionais , Humanos , Ferro/metabolismo , Mitocôndrias/metabolismo , Saccharomyces cerevisiae/metabolismoRESUMO
The vertebrate left-right axis is specified during embryogenesis by a transient organ: the left-right organizer (LRO). Species including fish, amphibians, rodents and humans deploy motile cilia in the LRO to break bilateral symmetry, while reptiles, birds, even-toed mammals and cetaceans are believed to have LROs without motile cilia. We searched for genes whose loss during vertebrate evolution follows this pattern and identified five genes encoding extracellular proteins, including a putative protease with hitherto unknown functions that we named ciliated left-right organizer metallopeptide (CIROP). Here, we show that CIROP is specifically expressed in ciliated LROs. In zebrafish and Xenopus, CIROP is required solely on the left side, downstream of the leftward flow, but upstream of DAND5, the first asymmetrically expressed gene. We further ascertained 21 human patients with loss-of-function CIROP mutations presenting with recessive situs anomalies. Our findings posit the existence of an ancestral genetic module that has twice disappeared during vertebrate evolution but remains essential for distinguishing left from right in humans.
Assuntos
Evolução Biológica , Padronização Corporal , Redes Reguladoras de Genes , Metaloproteases , Animais , Humanos , Padronização Corporal/genética , Padronização Corporal/fisiologia , Cílios/genética , Mutação com Perda de Função , Metaloproteases/genética , Metaloproteases/fisiologia , Proteínas/genética , Proteínas/fisiologia , Vertebrados/genéticaRESUMO
BACKGROUND: Echocardiography is the diagnostic modality for assessing cardiac systolic and diastolic function to diagnose and manage heart failure. However, manual interpretation of echocardiograms can be time consuming and subject to human error. Therefore, we developed a fully automated deep learning workflow to classify, segment, and annotate two-dimensional (2D) videos and Doppler modalities in echocardiograms. METHODS: We developed the workflow using a training dataset of 1145 echocardiograms and an internal test set of 406 echocardiograms from the prospective heart failure research platform (Asian Network for Translational Research and Cardiovascular Trials; ATTRaCT) in Asia, with previous manual tracings by expert sonographers. We validated the workflow against manual measurements in a curated dataset from Canada (Alberta Heart Failure Etiology and Analysis Research Team; HEART; n=1029 echocardiograms), a real-world dataset from Taiwan (n=31 241), the US-based EchoNet-Dynamic dataset (n=10 030), and in an independent prospective assessment of the Asian (ATTRaCT) and Canadian (Alberta HEART) datasets (n=142) with repeated independent measurements by two expert sonographers. FINDINGS: In the ATTRaCT test set, the automated workflow classified 2D videos and Doppler modalities with accuracies (number of correct predictions divided by the total number of predictions) ranging from 0·91 to 0·99. Segmentations of the left ventricle and left atrium were accurate, with a mean Dice similarity coefficient greater than 93% for all. In the external datasets (n=1029 to 10 030 echocardiograms used as input), automated measurements showed good agreement with locally measured values, with a mean absolute error range of 9-25 mL for left ventricular volumes, 6-10% for left ventricular ejection fraction (LVEF), and 1·8-2·2 for the ratio of the mitral inflow E wave to the tissue Doppler e' wave (E/e' ratio); and reliably classified systolic dysfunction (LVEF <40%, area under the receiver operating characteristic curve [AUC] range 0·90-0·92) and diastolic dysfunction (E/e' ratio ≥13, AUC range 0·91-0·91), with narrow 95% CIs for AUC values. Independent prospective evaluation confirmed less variance of automated compared with human expert measurements, with all individual equivalence coefficients being less than 0 for all measurements. INTERPRETATION: Deep learning algorithms can automatically annotate 2D videos and Doppler modalities with similar accuracy to manual measurements by expert sonographers. Use of an automated workflow might accelerate access, improve quality, and reduce costs in diagnosing and managing heart failure globally. FUNDING: A*STAR Biomedical Research Council and A*STAR Exploit Technologies.
Assuntos
Doenças Cardiovasculares/diagnóstico por imagem , Aprendizado Profundo , Ecocardiografia/métodos , Coração/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Estudos de Coortes , HumanosRESUMO
Large enzyme families such as the groups of zinc-dependent alcohol dehydrogenases (ADHs), long chain alcohol oxidases (AOxs) or amine dehydrogenases (AmDHs) with, sometimes, more than one million sequences in the non-redundant protein database and hundreds of experimentally characterized enzymes are excellent cases for protein engineering efforts aimed at refining and modifying substrate specificity. Yet, the backside of this wealth of information is that it becomes technically difficult to rationally select optimal sequence targets as well as sequence positions for mutagenesis studies. In all three cases, we approach the problem by starting with a group of experimentally well studied family members (including those with available 3D structures) and creating a structure-guided multiple sequence alignment and a modified phylogenetic tree (aka binding site tree) based just on a selection of potential substrate binding residue positions derived from experimental information (not from the full-length sequence alignment). Hereupon, the remaining, mostly uncharacterized enzyme sequences can be mapped; as a trend, sequence grouping in the tree branches follows substrate specificity. We show that this information can be used in the target selection for protein engineering work to narrow down to single suitable sequences and just a few relevant candidate positions for directed evolution towards activity for desired organic compound substrates. We also demonstrate how to find the closest thermophile example in the dataset if the engineering is aimed at achieving most robust enzymes.
RESUMO
BACKGROUND: The human proteins TMTC1, TMTC2, TMTC3 and TMTC4 have been experimentally shown to be components of a new O-mannosylation pathway. Their own mannosyl-transferase activity has been suspected but their actual enzymatic potential has not been demonstrated yet. So far, sequence analysis of TMTCs has been compromised by evolutionary sequence divergence within their membrane-embedded N-terminal region, sequence inaccuracies in the protein databases and the difficulty to interpret the large functional variety of known homologous proteins (mostly sugar transferases and some with known 3D structure). RESULTS: Evolutionary conserved molecular function among TMTCs is only possible with conserved membrane topology within their membrane-embedded N-terminal regions leading to the placement of homologous long intermittent loops at the same membrane side. Using this criterion, we demonstrate that all TMTCs have 11 transmembrane regions. The sequence segment homologous to Pfam model DUF1736 is actually just a loop between TM7 and TM8 that is located in the ER lumen and that contains a small hydrophobic, but not membrane-embedded helix. Not only do the membrane-embedded N-terminal regions of TMTCs share a common fold and 3D structural similarity with subgroups of GT-C sugar transferases. The conservation of residues critical for catalysis, for binding of a divalent metal ion and of the phosphate group of a lipid-linked sugar moiety throughout enzymatically and structurally well-studied GT-Cs and sequences of TMTCs indicates that TMTCs are actually sugar-transferring enzymes. We present credible 3D structural models of all four TMTCs (derived from their closest known homologues 5ezm/5f15) and find observed conserved sequence motifs rationalized as binding sites for a metal ion and for a dolichyl-phosphate-mannose moiety. CONCLUSIONS: With the results from both careful sequence analysis and structural modelling, we can conclusively say that the TMTCs are enzymatically active sugar transferases belonging to the GT-C/PMT superfamily. The DUF1736 segment, the loop between TM7 and TM8, is critical for catalysis and lipid-linked sugar moiety binding. Together with the available indirect experimental data, we conclude that the TMTCs are not only part of an O-mannosylation pathway in the endoplasmic reticulum of upper eukaryotes but, actually, they are the sought mannosyl-transferases.