RESUMEN
Multiple transcriptomic predictors of tumour cell radiosensitivity (RS) have been proposed, but they have not been benchmarked against one another or to control models. To address this, we present RadSigBench, a comprehensive benchmarking framework for RS signatures. The approach compares candidate models to those developed from randomly resampled control signatures and from cellular processes integral to the radiation response. Robust evaluation of signature accuracy, both overall and for individual tissues, is performed. The NCI60 and Cancer Cell Line Encyclopaedia datasets are integrated into our workflow. Prediction of two measures of RS is assessed: survival fraction after 2 Gy and mean inactivation dose. We apply the RadSigBench framework to seven prominent published signatures of radiation sensitivity and test for equivalence to control signatures. The mean out-of-sample R2 for the published models on test data was very poor at 0.01 (range: -0.05 to 0.09) for Cancer Cell Line Encyclopedia and 0.00 (range: -0.19 to 0.19) in the NCI60 data. The accuracy of both published and cellular process signatures investigated was equivalent to the resampled controls, suggesting that these signatures contain limited radiation-specific information. Enhanced modelling strategies are needed for effective prediction of intrinsic RS to inform clinical treatment regimes. We make recommendations for methodological improvements, for example the inclusion of perturbation data, multiomics, advanced machine learning and mechanistic modelling. Our validation framework provides for robust performance assessment of ongoing developments in intrinsic RS prediction.
Asunto(s)
Benchmarking , Neoplasias , Genómica , Humanos , Neoplasias/genética , Neoplasias/radioterapia , Tolerancia a Radiación/genética , TranscriptomaRESUMEN
Achilles' heel relationships arise when the status of one gene exposes a cell's vulnerability to perturbation of a second gene, such as chemical inhibition, providing therapeutic opportunities for precision oncology. SynLeGG (www.overton-lab.uk/synlegg) identifies and visualizes mutually exclusive loss signatures in 'omics data to enable discovery of genetic dependency relationships (GDRs) across 783 cancer cell lines and 30 tissues. While there is significant focus on genetic approaches, transcriptome data has advantages for investigation of GDRs and remains relatively underexplored. SynLeGG depends upon the MultiSEp algorithm for unsupervised assignment of cell lines into gene expression clusters, which provide the basis for analysis of CRISPR scores and mutational status in order to propose candidate GDRs. Benchmarking against SynLethDB demonstrates favourable performance for MultiSEp against competing approaches, finding significantly higher area under the Receiver Operator Characteristic curve and between 2.8-fold to 8.5-fold greater coverage. In addition to pan-cancer analysis, SynLeGG offers investigation of tissue-specific GDRs and recovers established relationships, including synthetic lethality for SMARCA2 with SMARCA4. Proteomics, Gene Ontology, protein-protein interactions and paralogue information are provided to assist interpretation and candidate drug target prioritization. SynLeGG predictions are significantly enriched in dependencies validated by a recently published CRISPR screen.
Asunto(s)
Genes Relacionados con las Neoplasias , Neoplasias/genética , Programas Informáticos , Mutaciones Letales Sintéticas , Sistemas CRISPR-Cas , Línea Celular Tumoral , Expresión Génica , Perfilación de la Expresión Génica , Humanos , Mutación , ProteómicaRESUMEN
BACKGROUND: Metastatic clear cell renal cell cancer (mccRCC) portends a poor prognosis and urgently requires better clinical tools for prognostication as well as for prediction of response to treatment. Considerable investment in molecular risk stratification has sought to overcome the performance ceiling encountered by methods restricted to traditional clinical parameters. However, replication of results has proven challenging, and intratumoural heterogeneity (ITH) may confound attempts at tissue-based stratification. METHODS: We investigated the influence of confounding ITH on the performance of a novel molecular prognostic model, enabled by pathologist-guided multiregion sampling (n = 183) of geographically separated mccRCC cohorts from the SuMR trial (development, n = 22) and the SCOTRRCC study (validation, n = 22). Tumour protein levels quantified by reverse phase protein array (RPPA) were investigated alongside clinical variables. Regularised wrapper selection identified features for Cox multivariate analysis with overall survival as the primary endpoint. RESULTS: The optimal subset of variables in the final stratification model consisted of N-cadherin, EPCAM, Age, mTOR (NEAT). Risk groups from NEAT had a markedly different prognosis in the validation cohort (log-rank p = 7.62 × 10-7; hazard ratio (HR) 37.9, 95% confidence interval 4.1-353.8) and 2-year survival rates (accuracy = 82%, Matthews correlation coefficient = 0.62). Comparisons with established clinico-pathological scores suggest favourable performance for NEAT (Net reclassification improvement 7.1% vs International Metastatic Database Consortium score, 25.4% vs Memorial Sloan Kettering Cancer Center score). Limitations include the relatively small cohorts and associated wide confidence intervals on predictive performance. Our multiregion sampling approach enabled investigation of NEAT validation when limiting the number of samples analysed per tumour, which significantly degraded performance. Indeed, sample selection could change risk group assignment for 64% of patients, and prognostication with one sample per patient performed only slightly better than random expectation (median logHR = 0.109). Low grade tissue was associated with 3.5-fold greater variation in predicted risk than high grade (p = 0.044). CONCLUSIONS: This case study in mccRCC quantitatively demonstrates the critical importance of tumour sampling for the success of molecular biomarker studies research where ITH is a factor. The NEAT model shows promise for mccRCC prognostication and warrants follow-up in larger cohorts. Our work evidences actionable parameters to guide sample collection (tumour coverage, size, grade) to inform the development of reproducible molecular risk stratification methods.
Asunto(s)
Biomarcadores de Tumor/genética , Carcinoma de Células Renales/genética , Heterogeneidad Genética , Neoplasias Renales/genética , Adulto , Anciano , Carcinoma de Células Renales/fisiopatología , Estudios de Cohortes , Femenino , Humanos , Neoplasias Renales/patología , Neoplasias Renales/fisiopatología , Masculino , Persona de Mediana Edad , Proteínas de Neoplasias , Pronóstico , Modelos de Riesgos Proporcionales , Análisis por Matrices de Proteínas , Tasa de SupervivenciaRESUMEN
RNA editing by deamination of specific adenosine bases to inosines during pre-mRNA processing generates edited isoforms of proteins. Recoding RNA editing is more widespread in Drosophila than in vertebrates. Editing levels rise strongly at metamorphosis, and Adar(5G1) null mutant flies lack editing events in hundreds of CNS transcripts; mutant flies have reduced viability, severely defective locomotion and age-dependent neurodegeneration. On the other hand, overexpressing an adult dADAR isoform with high enzymatic activity ubiquitously during larval and pupal stages is lethal. Advantage was taken of this to screen for genetic modifiers; Adar overexpression lethality is rescued by reduced dosage of the Rdl (Resistant to dieldrin), gene encoding a subunit of inhibitory GABA receptors. Reduced dosage of the Gad1 gene encoding the GABA synthetase also rescues Adar overexpression lethality. Drosophila Adar(5G1) mutant phenotypes are ameliorated by feeding GABA modulators. We demonstrate that neuronal excitability is linked to dADAR expression levels in individual neurons; Adar-overexpressing larval motor neurons show reduced excitability whereas Adar(5G1) null mutant or targeted Adar knockdown motor neurons exhibit increased excitability. GABA inhibitory signalling is impaired in human epileptic and autistic conditions, and vertebrate ADARs may have a relevant evolutionarily conserved control over neuronal excitability.
Asunto(s)
Adenosina Desaminasa/metabolismo , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/enzimología , Neuronas Motoras/enzimología , Potenciales de Acción , Adenosina Desaminasa/genética , Animales , Cromosomas de Insectos , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/fisiología , Genes Letales , Genotipo , Larva/enzimología , Neuronas Motoras/fisiología , Fenotipo , Edición de ARN , Receptores de GABA-A/genética , Transducción de Señal , Ácido gamma-Aminobutírico/metabolismoRESUMEN
Tissue microarrays (TMAs) allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumour biopsies. TMA Navigator (www.tmanavigator.org) is an open access web application for analysis of TMA data and related information, accommodating categorical, semi-continuous and continuous expression scores. Non-biological variation, or batch effects, can hinder data analysis and may be mitigated using the ComBat algorithm, which is incorporated with enhancements for automated application to TMA data. Unsupervised grouping of samples (patients) is provided according to Gaussian mixture modelling of marker scores, with cardinality selected by Bayesian information criterion regularization. Kaplan-Meier survival analysis is available, including comparison of groups identified by mixture modelling using the Mantel-Cox log-rank test. TMA Navigator also supports network inference approaches useful for TMA datasets, which often constitute comparatively few markers. Tissue and cell-type specific networks derived from TMA expression data offer insights into the molecular logic underlying pathophenotypes, towards more effective and personalized medicine. Output is interactive, and results may be exported for use with external programs. Private anonymous access is available, and user accounts may be generated for easier data management.
Asunto(s)
Neoplasias/mortalidad , Programas Informáticos , Análisis de Matrices Tisulares/métodos , Biomarcadores de Tumor/análisis , Neoplasias de la Mama/mortalidad , Femenino , Humanos , Internet , Análisis de SupervivenciaRESUMEN
The utilisation of synthetic torpor for interplanetary travel once seemed farfetched. However, mounting evidence points to torpor-induced protective benefits from the main hazards of space travel, namely, exposure to radiation and microgravity. To determine the radio-protective effects of an induced torpor-like state we exploited the ectothermic nature of the Danio rerio (zebrafish) in reducing their body temperatures to replicate the hypothermic states seen during natural torpor. We also administered melatonin as a sedative to reduce physical activity. Zebrafish were then exposed to low-dose radiation (0.3 Gy) to simulate radiation exposure on long-term space missions. Transcriptomic analysis found that radiation exposure led to an upregulation of inflammatory and immune signatures and a differentiation and regeneration phenotype driven by STAT3 and MYOD1 transcription factors. In addition, DNA repair processes were downregulated in the muscle two days' post-irradiation. The effects of hypothermia led to an increase in mitochondrial translation including genes involved in oxidative phosphorylation and a downregulation of extracellular matrix and developmental genes. Upon radiation exposure, increases in endoplasmic reticulum stress genes were observed in a torpor+radiation group with downregulation of immune-related and ECM genes. Exposing hypothermic zebrafish to radiation also resulted in a downregulation of ECM and developmental genes however, immune/inflammatory related pathways were downregulated in contrast to that observed in the radiation only group. A cross-species comparison was performed with the muscle of hibernating Ursus arctos horribilis (brown bear) to define shared mechanisms of cold tolerance. Shared responses show an upregulation of protein translation and metabolism of amino acids, as well as a hypoxia response with the shared downregulation of glycolysis, ECM, and developmental genes.
Asunto(s)
Hipotermia , Letargo , Animales , Pez Cebra/genética , Letargo/fisiología , Perfilación de la Expresión Génica , MúsculosRESUMEN
Transcriptomic personalisation of radiation therapy has gained considerable interest in recent years. However, independent model testing on in vitro data has shown poor performance. In this work, we assess the reproducibility in clinical applications of radiosensitivity signatures. Agreement between radiosensitivity predictions from published signatures using different microarray normalization methods was assessed. Control signatures developed from resampled in vitro data were benchmarked in clinical cohorts. Survival analysis was performed using each gene in the clinical transcriptomic data, and gene set enrichment analysis was used to determine pathways related to model performance in predicting survival and recurrence. The normalisation approach impacted calculated radiosensitivity index (RSI) values. Indeed, the limits of agreement exceeded 20% with different normalisation approaches. No published signature significantly improved on the resampled controls for prediction of clinical outcomes. Functional annotation of gene models suggested that many overlapping biological processes are associated with cancer outcomes in RT treated and non-RT treated patients, including proliferation and immune responses. In summary, different normalisation methods should not be used interchangeably. The utility of published signatures remains unclear given the large proportion of genes relating to cancer outcome. Biological processes influencing outcome overlapped for patients treated with or without radiation suggest that existing signatures may lack specificity.
RESUMEN
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Asunto(s)
Bases de Datos de Proteínas , Genómica , Nucleótidos/química , Proteómica , Secuencia de Bases , Etiquetas de Secuencia Expresada , Espectrometría de Masas , ProbabilidadRESUMEN
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline. Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
Asunto(s)
Biología Computacional/métodos , Cristalización/métodos , Proteínas , Algoritmos , Secuencia de Aminoácidos , Simulación por Computador , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Conformación Proteica , Proteínas/análisis , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Programas InformáticosRESUMEN
The therapeutic activation of antitumour immunity by immune checkpoint inhibitors (ICIs) is a significant advance in cancer medicine, not least due to the prospect of long-term remission. However, many patients are unresponsive to ICI therapy and may experience serious side effects; companion biomarkers are urgently needed to help inform ICI prescribing decisions. We present the IMMUNETS networks of gene coregulation in five key immune cell types and their application to interrogate control of nivolumab response in advanced melanoma cohorts. The results evidence a role for each of the IMMUNETS cell types in ICI response and in driving tumour clearance with independent cohorts from TCGA. As expected, 'immune hot' status, including T cell proliferation, correlates with response to first-line ICI therapy. Genes regulated in NK, dendritic, and B cells are the most prominent discriminators of nivolumab response in patients that had previously progressed on another ICI. Multivariate analysis controlling for tumour stage and age highlights CIITA and IKZF3 as candidate prognostic biomarkers. IMMUNETS provide a resource for network biology, enabling context-specific analysis of immune components in orthogonal datasets. Overall, our results illuminate the relationship between the tumour microenvironment and clinical trajectories, with potential implications for precision medicine.
RESUMEN
BACKGROUND: Integration of data from multiple domains can greatly enhance the quality and applicability of knowledge generated in analysis workflows. However, working with health data is challenging, requiring careful preparation in order to support meaningful interpretation and robust results. Ontologies encapsulate relationships between variables that can enrich the semantic content of health datasets to enhance interpretability and inform downstream analyses. FINDINGS: We developed an R package for electronic health data preparation, "eHDPrep," demonstrated upon a multimodal colorectal cancer dataset (661 patients, 155 variables; Colo-661); a further demonstrator is taken from The Cancer Genome Atlas (459 patients, 94 variables; TCGA-COAD). eHDPrep offers user-friendly methods for quality control, including internal consistency checking and redundancy removal with information-theoretic variable merging. Semantic enrichment functionality is provided, enabling generation of new informative "meta-variables" according to ontological common ancestry between variables, demonstrated with SNOMED CT and the Gene Ontology in the current study. eHDPrep also facilitates numerical encoding, variable extraction from free text, completeness analysis, and user review of modifications to the dataset. CONCLUSIONS: eHDPrep provides effective tools to assess and enhance data quality, laying the foundation for robust performance and interpretability in downstream analyses. Application to multimodal colorectal cancer datasets resulted in improved data quality, structuring, and robust encoding, as well as enhanced semantic information. We make eHDPrep available as an R package from CRAN (https://cran.r-project.org/package = eHDPrep) and GitHub (https://github.com/overton-group/eHDPrep).
Asunto(s)
Neoplasias Colorrectales , Semántica , Humanos , Ontología de Genes , Exactitud de los Datos , Control de Calidad , Neoplasias Colorrectales/genéticaRESUMEN
Loss PTEN function is one of the most common events driving aggressive prostate cancers and biochemically, PTEN is a lipid phosphatase which opposes the activation of the oncogenic PI3K-AKT signalling network. However, PTEN also has additional potential mechanisms of action, including protein phosphatase activity. Using a mutant enzyme, PTEN Y138L, which selectively lacks protein phosphatase activity, we characterised genetically modified mice lacking either the full function of PTEN in the prostate gland or only lacking protein phosphatase activity. The phenotypes of mice carrying a single allele of either wild-type Pten or PtenY138L in the prostate were similar, with common prostatic intraepithelial neoplasia (PIN) and similar gene expression profiles. However, the latter group, lacking PTEN protein phosphatase activity additionally showed lymphocyte infiltration around PIN and an increased immune cell gene expression signature. Prostate adenocarcinoma, elevated proliferation and AKT activation were only frequently observed when PTEN was fully deleted. We also identify a common gene expression signature of PTEN loss conserved in other studies (including Nkx3.1, Tnf and Cd44). We provide further insight into tumour development in the prostate driven by loss of PTEN function and show that PTEN protein phosphatase activity is not required for tumour suppression.
Asunto(s)
Fosfohidrolasa PTEN , Neoplasias de la Próstata , Animales , Masculino , Ratones , Lípidos , Fosfatidilinositol 3-Quinasas/metabolismo , Fosfoproteínas Fosfatasas , Próstata/metabolismo , Neoplasias de la Próstata/metabolismo , Proteínas Proto-Oncogénicas c-akt/genética , Proteínas Proto-Oncogénicas c-akt/metabolismo , Fosfohidrolasa PTEN/genética , Fosfohidrolasa PTEN/metabolismoRESUMEN
Production of diffracting crystals is a critical step in determining the three-dimensional structure of a protein by X-ray crystallography. Computational techniques to rank proteins by their propensity to yield diffraction-quality crystals can improve efficiency in obtaining structural data by guiding both protein selection and construct design. XANNpred comprises a pair of artificial neural networks that each predict the propensity of a selected protein sequence to produce diffraction-quality crystals by current structural biology techniques. Blind tests show XANNpred has accuracy and Matthews correlation values ranging from 75% to 81% and 0.50 to 0.63 respectively; values of area under the receiver operator characteristic (ROC) curve range from 0.81 to 0.88. On blind test data XANNpred outperforms the other available algorithms XtalPred, PXS, OB-Score, and ParCrys. XANNpred also guides construct design by presenting graphs of predicted propensity for diffraction-quality crystals against residue sequence position. The XANNpred-SG algorithm is likely to be most useful to target selection in structural genomics consortia, while the XANNpred-PDB algorithm is more suited to the general structural biology community. XANNpred predictions that include sliding window graphs are freely available from http://www.compbio.dundee.ac.uk/xannpred
Asunto(s)
Biología Computacional/métodos , Cristalización/métodos , Redes Neurales de la Computación , Proteínas/química , Programas Informáticos , Bases de Datos de Proteínas , Curva ROCRESUMEN
The Scottish Structural Proteomics Facility was funded to develop a laboratory scale approach to high throughput structure determination. The effort was successful in that over 40 structures were determined. These structures and the methods harnessed to obtain them are reported here. This report reflects on the value of automation but also on the continued requirement for a high degree of scientific and technical expertise. The efficiency of the process poses challenges to the current paradigm of structural analysis and publication. In the 5 year period we published ten peer-reviewed papers reporting structural data arising from the pipeline. Nevertheless, the number of structures solved exceeded our ability to analyse and publish each new finding. By reporting the experimental details and depositing the structures we hope to maximize the impact of the project by allowing others to follow up the relevant biology.
Asunto(s)
Laboratorios/organización & administración , Proteínas/química , Proteínas/metabolismo , Proteómica/organización & administración , Biología Computacional , Cristalización , Humanos , Proteínas/genética , EscociaRESUMEN
TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques. The protein sequence is analysed by 17 algorithms and compared to 8 databases. TarO gathers putative homologues, including orthologues, and then obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications. Analyses are run on a high-performance computing cluster, the results integrated, stored in a database and accessed through a web-based user interface. Output is in tabulated format and in the form of an annotated multiple sequence alignment (MSA) that may be edited interactively in the program Jalview. TarO also simplifies the gathering of additional annotations via the Distributed Annotation System, both from the MSA in Jalview and through links to Dasty2. Routes to other information gateways are included, for example to relevant pages from UniProt, COG and the Conserved Domains Database. Open access to TarO is available from a guest account with private accounts for academic use available on request. Future development of TarO will include further analysis steps and integration with the Protein Information Management System (PIMS), a sister project in the BBSRC 'Structural Proteomics of Rational Targets' initiative.
Asunto(s)
Proteínas/química , Análisis de Secuencia de Proteína , Programas Informáticos , Algoritmos , Bases de Datos de Proteínas , Internet , Estructura Terciaria de Proteína , Proteínas/fisiología , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Interfaz Usuario-ComputadorRESUMEN
Cell identity is governed by gene expression, regulated by transcription factor (TF) binding at cis-regulatory modules. Decoding the relationship between TF binding patterns and gene regulation is nontrivial, remaining a fundamental limitation in understanding cell decision-making. We developed the NetNC software to predict functionally active regulation of TF targets; demonstrated on nine datasets for the TFs Snail, Twist, and modENCODE Highly Occupied Target (HOT) regions. Snail and Twist are canonical drivers of epithelial to mesenchymal transition (EMT), a cell programme important in development, tumour progression and fibrosis. Predicted "neutral" (non-functional) TF binding always accounted for the majority (50% to 95%) of candidate target genes from statistically significant peaks and HOT regions had higher functional binding than most of the Snail and Twist datasets examined. Our results illuminated conserved gene networks that control epithelial plasticity in development and disease. We identified new gene functions and network modules including crosstalk with notch signalling and regulation of chromatin organisation, evidencing networks that reshape Waddington's epigenetic landscape during epithelial remodelling. Expression of orthologous functional TF targets discriminated breast cancer molecular subtypes and predicted novel tumour biology, with implications for precision medicine. Predicted invasion roles were validated using a tractable cell model, supporting our approach.
RESUMEN
The ability to rank proteins by their likely success in crystallization is useful in current Structural Biology efforts and in particular in high-throughput Structural Genomics initiatives. We present ParCrys, a Parzen Window approach to estimate a protein's propensity to produce diffraction-quality crystals. The Protein Data Bank (PDB) provided training data whilst the databases TargetDB and PepcDB were used to define feature selection data as well as test data independent of feature selection and training. ParCrys outperforms the OB-Score, SECRET and CRYSTALP on the data examined, with accuracy and Matthews correlation coefficient values of 79.1% and 0.582, respectively (74.0% and 0.227, respectively, on data with a 'real-world' ratio of positive:negative examples). ParCrys predictions and associated data are available from www.compbio.dundee.ac.uk/parcrys.
Asunto(s)
Algoritmos , Cristalización/métodos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestructura , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular , Conformación Proteica , Programas InformáticosRESUMEN
Sphingolipid biosynthesis commences with the condensation of L-serine and palmitoyl-CoA to produce 3-ketodihydrosphingosine (KDS). This reaction is catalysed by the PLP-dependent enzyme serine palmitoyltransferase (SPT; EC 2.3.1.50), which is a membrane-bound heterodimer (SPT1/SPT2) in eukaryotes such as humans and yeast and a cytoplasmic homodimer in the Gram-negative bacterium Sphingomonas paucimobilis. Unusually, the outer membrane of S. paucimobilis contains glycosphingolipid (GSL) instead of lipopolysaccharide (LPS), and SPT catalyses the first step of the GSL biosynthetic pathway in this organism. We report here the crystal structure of the holo-form of S. paucimobilis SPT at 1.3 A resolution. The enzyme is a symmetrical homodimer with two active sites and a monomeric tertiary structure consisting of three domains. The PLP cofactor is bound covalently to a lysine residue (Lys265) as an internal aldimine/Schiff base and the active site is composed of residues from both subunits, located at the bottom of a deep cleft. Models of the human SPT1/SPT2 heterodimer were generated from the bacterial structure by bioinformatics analysis. Mutations in the human SPT1-encoding subunit have been shown to cause a neuropathological disease known as hereditary sensory and autonomic neuropathy type I (HSAN1). Our models provide an understanding of how these mutations may affect the activity of the enzyme.
Asunto(s)
Proteínas Bacterianas/química , Modelos Moleculares , Serina C-Palmitoiltransferasa/química , Esfingolípidos/biosíntesis , Sphingomonas/enzimología , Secuencia de Aminoácidos , Proteínas Bacterianas/fisiología , Sitios de Unión , Biología Computacional , Dimerización , Holoenzimas/química , Humanos , Datos de Secuencia Molecular , Mutación , Conformación Proteica , Subunidades de Proteína/química , Subunidades de Proteína/genética , Subunidades de Proteína/fisiología , Serina C-Palmitoiltransferasa/genética , Serina C-Palmitoiltransferasa/fisiologíaRESUMEN
Birds have played a central role in many biological disciplines, particularly ecology, evolution, and behavior. The chicken, as a model vertebrate, also represents an important experimental system for developmental biologists, immunologists, cell biologists, and geneticists. However, genomic resources for the chicken have lagged behind those for other model organisms, with only 1845 nonredundant full-length chicken cDNA sequences currently deposited in the EMBL databank. We describe a large-scale expressed-sequence-tag (EST) project aimed at gene discovery in chickens (http://www.chick.umist.ac.uk). In total, 339,314 ESTs have been sequenced from 64 cDNA libraries generated from 21 different embryonic and adult tissues. These were clustered and assembled into 85,486 contiguous sequences (contigs). We find that a minimum of 38% of the contigs have orthologs in other organisms and define an upper limit of 13,000 new chicken genes. The remaining contigs may include novel avian specific or rapidly evolving genes. Comparison of the contigs with known chicken genes and orthologs indicates that 30% include cDNAs that contain the start codon and 20% of the contigs represent full-length cDNA sequences. Using this dataset, we estimate that chickens have approximately 35,000 genes in total, suggesting that this number may be a characteristic feature of vertebrates.
Asunto(s)
Pollos/genética , ADN Complementario/genética , Animales , Embrión de Pollo , Mapeo Cromosómico/métodos , Etiquetas de Secuencia Expresada , Homología de Secuencia de Ácido NucleicoRESUMEN
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K. The protein crystallizes in a primitive triclinic lattice, with unit-cell parameters a = 45.3, b = 60.5, c = 117.6 A, alpha = 87.2, beta = 81.2, gamma = 68.4 degrees . A complete data set has been collected to 2.3 A resolution at the ESRF. Consideration of the likely solvent content suggested the asymmetric unit to contain four molecules. This has been confirmed by molecular-replacement phasing calculations, which give a solution with four monomers using a monomer of pantothenate synthetase from Escherichia coli (PDB code 1iho), which is 41% identical to Sar2676, as a search model.