ABSTRACT
The transcription-related DNA damage response was analyzed on a genome-wide scale with great spatial and temporal resolution. Upon UV irradiation, a slowdown of transcript elongation and restriction of gene activity to the promoter-proximal Ć¢ĀĀ¼25 kb is observed. This is associated with a shift from expression of long mRNAs to shorter isoforms, incorporating alternative last exons (ALEs) that are more proximal to the transcription start site. Notably, this includes a shift from a protein-coding ASCC3 mRNA to a shorter ALE isoform of which the RNA, rather than an encoded protein, is critical for the eventual recovery of transcription. The non-coding ASCC3 isoform counteracts the function of the protein-coding isoform, indicating crosstalk between them. Thus, the ASCC3 gene expresses both coding and non-coding transcript isoforms with opposite effects on transcription recovery after UV-induced DNA damage.
Subject(s)
Alternative Splicing/radiation effects , DNA Helicases/genetics , RNA, Untranslated/genetics , Transcription, Genetic , Ultraviolet Rays , Cell Line , Exons , Humans , RNA Polymerase II/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcription Elongation, Genetic/radiation effects , Transcription Initiation, Genetic/radiation effectsABSTRACT
Image-based profiling of the cellular response to drug compounds has proven effective at characterizing the morphological changes resulting from perturbation experiments. As data availability increases, however, there are growing demands for novel deep-learning methods. We applied the SwinV2 computer vision architecture to predict the mechanism of action of 10 kinase inhibitor compounds directly from Cell Painting images. This method outperforms the standard approach of using image-based profiles (IBP)-multidimensional feature set representations generated by bioimaging software. Furthermore, our fusion approach-cell-vision fusion, combining three different data modalities, images, IBPs, and chemical structures-achieved 69.79% accuracy and 70.56% F1 score, 4.20% and 5.49% higher, respectively, than the best-performing IBP method. We provide three techniques, specific to Cell Painting images, which enable deep-learning architectures to train effectively and demonstrate approaches to combat the significant batch effects present in large Cell Painting datasets.
ABSTRACT
The intracellular bacterial pathogen Chlamydia trachomatis replicates within a membrane-bound compartment called the inclusion. Upon infection with several chlamydiae, each bacterium creates its own inclusion, resulting in multiple inclusions within each host cell. Ultimately, these inclusions fuse together in a process that requires the chlamydial protein IncA. Here, we show that inclusions form unique contact sites (inclusion contact sites, ICSs) prior to fusion, that serve as fusogenic platforms in which specific lipids and chlamydial proteins concentrate. Fusion depends on IncA clustering within ICSs and is regulated by PI(3,4)P2 and sphingolipids. As IncA concentrates within ICSs, its C-terminus likely interacts in trans with IncA on the apposing membrane, securing a high concentration of IncA at fusion sites. This regulatory mechanism contrasts with eukaryotic or viral fusion systems that are either composed of multiple proteins or use a change in pH to initiate membrane fusion. Thus, our study demonstrates that Chlamydia-mediated membrane fusion is primarily regulated by specific structural domains in IncA and its local organization on the inclusion membrane, which is affected by the host cell lipid composition.
Subject(s)
Bacterial Proteins , Chlamydia trachomatis , Inclusion Bodies , Membrane Fusion , Chlamydia trachomatis/metabolism , Chlamydia trachomatis/physiology , Inclusion Bodies/metabolism , HeLa Cells , Humans , Bacterial Proteins/metabolism , Bacterial Proteins/genetics , Chlamydia Infections/microbiology , Chlamydia Infections/metabolism , Sphingolipids/metabolism , Membrane ProteinsABSTRACT
MOTIVATION: Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Improving the sensitivity and selectivity of methods designed for these two tasks therefore has downstream benefits for many other bioinformatics applications. RESULTS: We describe the latest implementation of the GenTHREADER method for structure prediction on a genomic scale. The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model. We find this combination significantly improves both detection of useful templates and accuracy of sequence-structure alignments relative to other competitive approaches. We further present a second implementation of the protocol designed for the task of discriminating superfamilies from one another. This method, pDomTHREADER, is the first to incorporate both sequence and structural data directly in this task and improves sensitivity and selectivity over the standard version of pGenTHREADER and three other standard methods for remote homology detection.
Subject(s)
Computational Biology/methods , Protein Structure, Tertiary , Proteins/classification , Software , Protein Folding , Proteins/chemistry , Sequence Analysis, Protein/methodsABSTRACT
Widespread mammographic screening programs and improved self-monitoring allow for breast cancer to be detected earlier than ever before. Breast-conserving surgery is a successful treatment for select women. However, up to 40% of women develop local recurrence after surgery despite apparently tumor-free margins. This suggests that morphologically normal breast may harbor early alterations that contribute to increased risk of cancer recurrence. We conducted a comprehensive transcriptomic and proteomic analysis to characterize 57 fresh-frozen tissues from breast cancers and matched histologically normal tissues resected proximal to (<2 cm) and distant from (5-10 cm) the primary tumor, using tissues from cosmetic reduction mammoplasties as baseline. Four distinct transcriptomic subtypes are identified within matched normal tissues: metabolic; immune; matrisome/epithelial-mesenchymal transition, and non-coding enriched. Key components of the subtypes are supported by proteomic and tissue composition analyses. We find that the metabolic subtype is associated with poor prognosis (p < 0.001, HR6.1). Examination of genes representing the metabolic signature identifies several genes able to prognosticate outcome from histologically normal tissues. A subset of these have been reported for their predictive ability in cancer but, to the best of our knowledge, these have not been reported altered in matched normal tissues. This study takes an important first step toward characterizing matched normal tissues resected at pre-defined margins from the primary tumor. Unlocking the predictive potential of unexcised tissue could prove key to driving the realization of personalized medicine for breast cancer patients, allowing for more biologically-driven analyses of tissue margins than morphology alone.
ABSTRACT
Many intracellular bacteria, including Chlamydia, establish a parasitic membrane-bound organelle inside the host cell that is essential for the bacteria's survival. Chlamydia trachomatis forms inclusions that are decorated with poorly characterized membrane proteins known as Incs. The prototypical Inc, called IncA, enhances Chlamydia pathogenicity by promoting the homotypic fusion of inclusions and shares structural and functional similarity to eukaryotic SNAREs. Here, we present the atomic structure of the cytoplasmic domain of IncA, which reveals a non-canonical four-helix bundle. Structure-based mutagenesis, molecular dynamics simulation, and functional cellular assays identify an intramolecular clamp that is essential for IncA-mediated homotypic membrane fusion during infection.
Subject(s)
Bacterial Proteins/ultrastructure , Chlamydia Infections/microbiology , Chlamydia trachomatis/pathogenicity , Inclusion Bodies/microbiology , Membrane Fusion , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Chlamydia trachomatis/genetics , Chlamydia trachomatis/metabolism , Crystallography, X-Ray , Gene Knockout Techniques , HeLa Cells , Humans , Molecular Dynamics Simulation , Mutagenesis , Protein Conformation, alpha-Helical , Protein Domains/genetics , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Recombinant Proteins/ultrastructure , SNARE Proteins/chemistryABSTRACT
Natively unstructured regions are a common feature of eukaryotic proteomes. Between 30% and 60% of proteins are predicted to contain long stretches of disordered residues, and not only have many of these regions been confirmed experimentally, but they have also been found to be essential for protein function. In this study, we directly address the potential contribution of protein disorder in predicting protein function using standard Gene Ontology (GO) categories. Initially we analyse the occurrence of protein disorder in the human proteome and report ontology categories that are enriched in disordered proteins. Pattern analysis of the distributions of disordered regions in human sequences demonstrated that the functions of intrinsically disordered proteins are both length- and position-dependent. These dependencies were then encoded in feature vectors to quantify the contribution of disorder in human protein function prediction using Support Vector Machine classifiers. The prediction accuracies of 26 GO categories relating to signalling and molecular recognition are improved using the disorder features. The most significant improvements were observed for kinase, phosphorylation, growth factor, and helicase categories. Furthermore, we provide predicted GO term assignments using these classifiers for a set of unannotated and orphan human proteins. In this study, the importance of capturing protein disorder information and its value in function prediction is demonstrated. The GO category classifiers generated can be used to provide more reliable predictions and further insights into the behaviour of orphan and unannotated proteins.
Subject(s)
Models, Biological , Models, Chemical , Models, Molecular , Pattern Recognition, Automated/methods , Proteins/chemistry , Proteins/metabolism , Sequence Analysis, Protein/methods , Amino Acid Sequence , Artificial Intelligence , Computer Simulation , Molecular Sequence Data , Protein Denaturation , Protein Folding , Proteins/ultrastructureABSTRACT
NODAL/Activin signaling orchestrates key processes during embryonic development via SMAD2. How SMAD2 activates programs of gene expression that are modulated over time however, is not known. Here we delineate the sequence of events that occur from SMAD2 binding to transcriptional activation, and the mechanisms underlying them. NODAL/Activin signaling induces dramatic chromatin landscape changes, and a dynamic transcriptional network regulated by SMAD2, acting via multiple mechanisms. Crucially we have discovered two modes of SMAD2 binding. SMAD2 can bind pre-acetylated nucleosome-depleted sites. However, it also binds to unacetylated, closed chromatin, independently of pioneer factors, where it induces nucleosome displacement and histone acetylation. For a subset of genes, this requires SMARCA4. We find that long term modulation of the transcriptional responses requires continued NODAL/Activin signaling. Thus SMAD2 binding does not linearly equate with transcriptional kinetics, and our data suggest that SMAD2 recruits multiple co-factors during sustained signaling to shape the downstream transcriptional program.
Subject(s)
Activins/metabolism , Chromatin/metabolism , Gene Expression Regulation, Developmental , Nodal Protein/metabolism , Signal Transduction , Smad2 Protein/metabolism , Transcription, Genetic , Animals , Mice , Protein BindingABSTRACT
BACKGROUND: CatSper1 and CatSper2 are two recently identified channel-like proteins, which show sperm specific expression patterns. Through targeted mutagenesis in the mouse, CatSper1 has been shown to be required for fertility, sperm motility and for cAMP induced Ca2+ current in sperm. Both channels resemble a single pore forming repeat from a four repeat voltage dependent Ca2+ /Na+ channel. However, neither CatSper1 or CatSper2 have been shown to function as cation channels when transfected into cells, singly or in conjunction. As the pore forming units of voltage gated cation channels form a tetramer it has been suggested that the known CatSper proteins require additional subunits and/or interaction partners to function. RESULTS: Using in silico gene identification and prediction techniques, we have identified two further members of the CatSper family, CatSper3 and Catsper4. Each carries a single channel-forming domain with the predicted pore-loop containing the consensus sequence TxDxW. Each of the new CatSper genes has evidence for expression in the testis. Furthermore we identified coiled-coil protein-protein interaction domains in the C-terminal tails of each of the CatSper channels, implying that CatSper channels 1,2,3 and 4 may interact directly or indirectly to form a functional tetramer. CONCLUSIONS: The topological and sequence relationship of CatSper1 and CatSper2 to the four repeat Ca2+ /Na+ channels suggested other members of this family may exist. We have identified a further two novel CatSper genes, conserved in both the human and mouse genomes. Furthermore, all four of the CatSper proteins are predicted to contain a common coiled-coil protein-protein interaction domain in their C-terminal tail. Coupled with expression data this leads to the hypothesis that the CatSper proteins form a functional hetero-tetrameric channel in sperm.
Subject(s)
Calcium Channels/genetics , Genes , Seminal Plasma Proteins/genetics , Amino Acid Sequence , Animals , Biopolymers , Calcium/metabolism , Calcium Channels/biosynthesis , Calcium Channels/chemistry , Calcium Channels/isolation & purification , Chromosome Mapping , Chromosomes, Human, Pair 1/genetics , Chromosomes, Human, Pair 5/genetics , Consensus Sequence , Expressed Sequence Tags , Gene Expression , Humans , Ion Channels/chemistry , Ion Transport , Male , Mice , Molecular Sequence Data , Multigene Family , Phylogeny , Protein Conformation , Protein Interaction Mapping , Protein Structure, Tertiary , RNA, Messenger/genetics , Seminal Plasma Proteins/biosynthesis , Seminal Plasma Proteins/chemistry , Seminal Plasma Proteins/isolation & purification , Sequence Alignment , Sequence Homology, Amino Acid , Species Specificity , Testis/metabolismABSTRACT
SUMMARY: Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET)) that enables identification and visualisation of gross abnormalities in gene expression (outliers) in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI), using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java. AVAILABILITY: The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.
Subject(s)
Gene Expression Regulation , Genes , Oligonucleotide Array Sequence Analysis/methods , Software , Base Sequence , Humans , Macrophages/metabolism , Molecular Sequence Data , Monocytes/cytology , ROC Curve , Reproducibility of Results , User-Computer InterfaceABSTRACT
BACKGROUND: Disordered proteins need to be expressed to carry out specified functions; however, their accumulation in the cell can potentially cause major problems through protein misfolding and aggregation. Gene expression levels, mRNA decay rates, microRNA (miRNA) targeting and ubiquitination have critical roles in the degradation and disposal of human proteins and transcripts. Here, we describe a study examining these features to gain insights into the regulation of disordered proteins. RESULTS: In comparison with ordered proteins, disordered proteins have a greater proportion of predicted ubiquitination sites. The transcripts encoding disordered proteins also have higher proportions of predicted miRNA target sites and higher mRNA decay rates, both of which are indicative of the observed lower gene expression levels. The results suggest that the disordered proteins and their transcripts are present in the cell at low levels and/or for a short time before being targeted for disposal. Surprisingly, we find that for a significant proportion of highly disordered proteins, all four of these trends are reversed. Predicted estimates for miRNA targets, ubiquitination and mRNA decay rate are low in the highly disordered proteins that are constitutively and/or highly expressed. CONCLUSIONS: Mechanisms are in place to protect the cell from these potentially dangerous proteins. The evidence suggests that the enrichment of signals for miRNA targeting and ubiquitination may help prevent the accumulation of disordered proteins in the cell. Our data also provide evidence for a mechanism by which a significant proportion of highly disordered proteins (with high expression levels) can escape rapid degradation to allow them to successfully carry out their function.
Subject(s)
Gene Expression Profiling , Protein Stability , Proteome/metabolism , Humans , MicroRNAs/metabolism , Protein Folding , RNA Stability , UbiquitinationABSTRACT
We have merged four different views of the human plasma proteome, based on different methodologies, into a single nonredundant list of 1175 distinct gene products. The methodologies used were 1) literature search for proteins reported to occur in plasma or serum; 2) multidimensional chromatography of proteins followed by two-dimensional electrophoresis and mass spectroscopy (MS) identification of resolved proteins; 3) tryptic digestion and multidimensional chromatography of peptides followed by MS identification; and 4) tryptic digestion and multidimensional chromatography of peptides from low-molecular-mass plasma components followed by MS identification. Of 1,175 nonredundant gene products, 195 were included in more than one of the four input datasets. Only 46 appeared in all four. Predictions of signal sequence and transmembrane domain occurrence, as well as Genome Ontology annotation assignments, allowed characterization of the nonredundant list and comparison of the data sources. The "nonproteomic" literature (468 input proteins) is strongly biased toward signal sequence-containing extracellular proteins, while the three proteomics methods showed a much higher representation of cellular proteins, including nuclear, cytoplasmic, and kinesin complex proteins. Cytokines and protein hormones were almost completely absent from the proteomics data (presumably due to low abundance), while categories like DNA-binding proteins were almost entirely absent from the literature data (perhaps unexpected and therefore not sought). Most major categories of proteins in the human proteome are represented in plasma, with the distribution at successively deeper layers shifting from mostly extracellular to a distribution more like the whole (primarily cellular) proteome. The resulting nonredundant list confirms the presence of a number of interesting candidate marker proteins in plasma and serum.