ABSTRACT
S-phase entry and exit are regulated by hundreds of protein complexes that assemble "just in time," orchestrated by a multitude of distinct events. To help understand their interplay, we have created a tailored visualization based on the Minardo layout, highlighting over 80 essential events. This complements our earlier visualization of M-phase, and both can be displayed together, giving a comprehensive overview of the events regulating the cell division cycle. To view this SnapShot, open or download the PDF.
Subject(s)
Cell Cycle/genetics , Mitosis/genetics , Multiprotein Complexes/genetics , S Phase/genetics , Cell Division/genetics , Cyclin B/genetics , Cyclin D/genetics , Cyclin-Dependent Kinases/genetics , G2 Phase/genetics , Humans , Phosphorylation/genetics , Proteasome Endopeptidase Complex/geneticsABSTRACT
During mitosis, a cell divides its duplicated genome into two identical daughter cells. This process must occur without errors to prevent proliferative diseases (e.g., cancer). A key mechanism controlling mitosis is the precise timing of more than 32,000 phosphorylation and dephosphorylation events by a network of kinases and counterbalancing phosphatases. The identity, magnitude, and temporal regulation of these events have emerged recently, largely from advances in mass spectrometry. Here, we show phosphoevents currently believed to be key regulators of mitosis. For an animated version of this SnapShot, please see http://www.cell.com/cell/enhanced/odonoghue2.
Subject(s)
Mitosis , Protein Kinases/metabolism , Animals , Humans , PhosphorylationABSTRACT
The insulin/IGF1signaling pathway (ISP) plays an essential role in long-term health. Some perturbations in this pathway are associated with diseases such as type 2 diabetes; other perturbations extend lifespan in worms, flies, and mice. The ISP regulates many biological processes, including energy storage, apoptosis, transcription, and cellular homeostasis. Such regulation involves precise rewiring of temporal events in protein phosphorylation networks. For an animated version of this Enhanced SnapShot, please visit http://www.cell.com/cell/enhanced/odonoghue.
Subject(s)
Insulin-Like Growth Factor I/metabolism , Insulin/metabolism , Signal Transduction , Animals , Humans , Phosphorylation , Proteins/metabolismABSTRACT
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that Ë6% of the proteome mimicked human proteins, while Ë7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further Ë29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.
Subject(s)
Angiotensin-Converting Enzyme 2/metabolism , Host-Pathogen Interactions/genetics , Protein Processing, Post-Translational , SARS-CoV-2/metabolism , Spike Glycoprotein, Coronavirus/metabolism , Amino Acid Transport Systems, Neutral/chemistry , Amino Acid Transport Systems, Neutral/genetics , Amino Acid Transport Systems, Neutral/metabolism , Angiotensin-Converting Enzyme 2/chemistry , Angiotensin-Converting Enzyme 2/genetics , Binding Sites , COVID-19/genetics , COVID-19/metabolism , COVID-19/virology , Computational Biology/methods , Coronavirus Envelope Proteins/chemistry , Coronavirus Envelope Proteins/genetics , Coronavirus Envelope Proteins/metabolism , Coronavirus Nucleocapsid Proteins/chemistry , Coronavirus Nucleocapsid Proteins/genetics , Coronavirus Nucleocapsid Proteins/metabolism , Humans , Mitochondrial Membrane Transport Proteins/chemistry , Mitochondrial Membrane Transport Proteins/genetics , Mitochondrial Membrane Transport Proteins/metabolism , Mitochondrial Precursor Protein Import Complex Proteins , Models, Molecular , Molecular Mimicry , Neuropilin-1/chemistry , Neuropilin-1/genetics , Neuropilin-1/metabolism , Phosphoproteins/chemistry , Phosphoproteins/genetics , Phosphoproteins/metabolism , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Protein Multimerization , SARS-CoV-2/chemistry , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Viral Matrix Proteins/chemistry , Viral Matrix Proteins/genetics , Viral Matrix Proteins/metabolism , Viroporin Proteins/chemistry , Viroporin Proteins/genetics , Viroporin Proteins/metabolism , Virus ReplicationABSTRACT
A three-dimensional chromatin state underpins the structural and functional basis of the genome by bringing regulatory elements and genes into close spatial proximity to ensure proper, cell-type-specific gene expression profiles. Here, we performed Hi-C chromosome conformation capture sequencing to investigate how three-dimensional chromatin organization is disrupted in the context of copy-number variation, long-range epigenetic remodeling, and atypical gene expression programs in prostate cancer. We find that cancer cells retain the ability to segment their genomes into megabase-sized topologically associated domains (TADs); however, these domains are generally smaller due to establishment of additional domain boundaries. Interestingly, a large proportion of the new cancer-specific domain boundaries occur at regions that display copy-number variation. Notably, a common deletion on 17p13.1 in prostate cancer spanning the TP53 tumor suppressor locus results in bifurcation of a single TAD into two distinct smaller TADs. Change in domain structure is also accompanied by novel cancer-specific chromatin interactions within the TADs that are enriched at regulatory elements such as enhancers, promoters, and insulators, and associated with alterations in gene expression. We also show that differential chromatin interactions across regulatory regions occur within long-range epigenetically activated or silenced regions of concordant gene activation or repression in prostate cancer. Finally, we present a novel visualization tool that enables integrated exploration of Hi-C interaction data, the transcriptome, and epigenome. This study provides new insights into the relationship between long-range epigenetic and genomic dysregulation and changes in higher-order chromatin interactions in cancer.
Subject(s)
Chromatin/genetics , Epigenesis, Genetic , Neoplasms/genetics , CCCTC-Binding Factor , Cell Line, Tumor , Enhancer Elements, Genetic , Gene Expression Regulation, Neoplastic , Genome, Human , Histones/metabolism , Humans , Molecular Sequence Annotation , Neoplasms/metabolism , Protein Binding , Protein Processing, Post-Translational , Repressor Proteins/physiologyABSTRACT
Despite substantial and successful projects for structural genomics, many proteins remain for which neither experimental structures nor homology-based models are known for any part of the amino acid sequence. These have been called "dark proteins," in contrast to non-dark proteins, in which at least part of the sequence has a known or inferred structure. It has been hypothesized that non-dark proteins may be more abundantly expressed than dark proteins, which are known to have much fewer sequence relatives. Surprisingly, the opposite has been observed: human dark and non-dark proteins had quite similar levels of expression, in terms of both mRNA and protein abundance. Such high levels of expression strongly indicate that dark proteins-as a group-are important for cellular function. This is remarkable, given how carefully structural biologists have focused on proteins crucial for function, and highlights the important challenge posed by dark proteins in future research.
Subject(s)
Databases, Protein , Proteome/analysis , Computational Biology , Protein ConformationABSTRACT
We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only â¼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.
Subject(s)
Computational Biology/methods , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Algorithms , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Eukaryota/metabolism , Humans , Models, Molecular , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteome/chemistry , Proteome/genetics , Viruses/genetics , Viruses/metabolismABSTRACT
BACKGROUND: Sarcomas are rare, phenotypically heterogeneous cancers that disproportionately affect the young. Outside rare syndromes, the nature, extent, and clinical significance of their genetic origins are not known. We aimed to investigate the genetic basis for bone and soft-tissue sarcoma seen in routine clinical practice. METHODS: In this genetic study, we included 1162 patients with sarcoma from four cohorts (the International Sarcoma Kindred Study [ISKS], 966 probands; Project GENESIS, 48 probands; Asan Bio-Resource Center, 138 probands; and kConFab, ten probands), who were older than 15 years at the time of consent and had a histologically confirmed diagnosis of sarcoma, recruited from specialist sarcoma clinics without regard to family history. Detailed clinical, pathological, and pedigree information was collected, and cancer diagnoses in probands and relatives were independently verified. Targeted exon sequencing using blood (n=1114) or saliva (n=48) samples was done on 72 genes (selected due to associations with increased cancer risk) and rare variants were stratified into classes approximating the International Agency for Research on Cancer (IARC) clinical classification for genetic variation. We did a case-control rare variant burden analysis using 6545 Caucasian controls included from three cohorts (ISKS, 235 controls; LifePool, 2010 controls; and National Heart, Lung, and Blood Institute Exome Sequencing Project [ESP], 4300 controls). FINDINGS: The median age at cancer diagnosis in 1162 sarcoma probands was 46 years (IQR 29-58), 170 (15%) of 1162 probands had multiple primary cancers, and 155 (17%) of 911 families with informative pedigrees fitted recognisable cancer syndromes. Using a case-control rare variant burden analysis, 638 (55%) of 1162 sarcoma probands bore an excess of pathogenic germline variants (combined odds ratio [OR] 1·43, 95% CI 1·24-1·64, p<0·0001), with 227 known or expected pathogenic variants occurring in 217 individuals. All classes of pathogenic variants (known, expected, or predicted) were associated with earlier age of cancer onset. In addition to TP53, ATM, ATR, and BRCA2, an unexpected excess of functionally pathogenic variants was seen in ERCC2. Probands were more likely than controls to have multiple pathogenic variants compared with the combined control cohort group and the LifePool control cohort (OR 2·22, 95% CI 1·57-3·14, p=1·2â×â10(-6)) and the cumulative burden of multiple variants correlated with earlier age at cancer diagnosis (Mantel-Cox log-rank test for trend, p=0·0032). 66 of 1162 probands carried notifiable variants following expert clinical review (those recognised to be clinically significant to health and about which patients should be advised), whereas 293 (25%) probands carried variants with potential therapeutic significance. INTERPRETATION: About half of patients with sarcoma have putatively pathogenic monogenic and polygenic variation in known and novel cancer genes, with implications for risk management and treatment. FUNDING: Rainbows for Kate Foundation, Johanna Sewell Research Foundation, Australian National Health and Medical Research Council, Cancer Australia, Sarcoma UK, National Cancer Institute, Liddy Shriver Sarcoma Initiative.
Subject(s)
Biomarkers, Tumor/genetics , Exome/genetics , Mutation/genetics , Saliva/chemistry , Sarcoma/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/blood , Case-Control Studies , Child , Child, Preschool , Female , Follow-Up Studies , High-Throughput Nucleotide Sequencing/methods , Humans , Infant , Infant, Newborn , International Agencies , Male , Middle Aged , Neoplasm Staging , Pedigree , Prognosis , Risk Factors , Sarcoma/blood , Young AdultABSTRACT
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
Subject(s)
Data Mining/methods , Databases, Genetic , Gene Ontology , Animals , Data Mining/trends , Databases, Genetic/trends , Gene Ontology/trends , HumansABSTRACT
BACKGROUND: To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. RESULTS: To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. CONCLUSIONS: The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria.
Subject(s)
Amyloid beta-Protein Precursor/chemistry , Computational Biology/methods , Computer Graphics , Sequence Analysis, Protein/methods , Software , src-Family Kinases/chemistry , Amyloid beta-Protein Precursor/metabolism , B-Lymphocytes/metabolism , Databases, Protein , Humans , Protein Conformation , Protein Processing, Post-Translational , src-Family Kinases/metabolismABSTRACT
Data visualisation is usually a crucial first step in analysing and exploring large-scale complex data. The visualisation of proteomics time-course data on post-translational modifications presents a particular challenge that is largely unmet by existing tools and methods. To this end, we present Minardo, a novel visualisation strategy tailored for such proteomics data, in which data layout is driven by both cellular topology and temporal order. In this work, we utilised the Minardo strategy to visualise a dataset showing phosphorylation events in response to insulin. We evaluated the visualisation together with experts in diabetes and obesity, which led to new insights into the insulin response pathway. Based on this success, we outline how this layout strategy could be automated into a web-based tool for visualising a broad range of proteomics time-course data. We also discuss how the approach could be extended to include protein 3D structure information, as well as higher dimensional data, such as a range of experimental conditions. We also discuss our entry of Minardo in the international DREAM8 competition.
Subject(s)
Protein Processing, Post-Translational , Proteins/chemistry , Proteomics/methods , Signal Transduction , Computational Biology/methods , Humans , Imaging, Three-Dimensional/methods , Proteins/metabolism , Reproducibility of Results , SoftwareABSTRACT
Methods and tools for visualizing biological data have improved considerably over the last decades, but they are still inadequate for some high-throughput data sets. For most users, a key challenge is to benefit from the deluge of data without being overwhelmed by it. This challenge is still largely unfulfilled and will require the development of truly integrated and highly useable tools.
Subject(s)
Image Processing, Computer-Assisted , Systems Integration , User-Computer InterfaceABSTRACT
High-throughput studies of biological systems are rapidly accumulating a wealth of 'omics'-scale data. Visualization is a key aspect of both the analysis and understanding of these data, and users now have many visualization methods and tools to choose from. The challenge is to create clear, meaningful and integrated visualizations that give biological insight, without being overwhelmed by the intrinsic complexity of the data. In this review, we discuss how visualization tools are being used to help interpret protein interaction, gene expression and metabolic profile data, and we highlight emerging new directions.
Subject(s)
Genomics , Image Processing, Computer-Assisted , Metabolomics , Proteomics , Systems Biology , Mass Spectrometry , Nuclear Magnetic Resonance, Biomolecular , Protein BindingABSTRACT
Structural biology is rapidly accumulating a wealth of detailed information about protein function, binding sites, RNA, large assemblies and molecular motions. These data are increasingly of interest to a broader community of life scientists, not just structural experts. Visualization is a primary means for accessing and using these data, yet visualization is also a stumbling block that prevents many life scientists from benefiting from three-dimensional structural data. In this review, we focus on key biological questions where visualizing three-dimensional structures can provide insight and describe available methods and tools.
Subject(s)
Image Processing, Computer-Assisted , Macromolecular Substances , Crystallography, X-Ray , Internet , Models, Molecular , Molecular ConformationABSTRACT
Introduction: When visualizing complex data, the layout method chosen can greatly affect the ability to identify outliers, spot incorrect modeling assumptions, or recognize unexpected patterns. Additionally, visual layout can play a crucial role in communicating results to peers. Methods: In this paper, we compared the effectiveness of three visual layouts-the adjacency matrix, a half-matrix layout, and a circular layout-for visualizing spatial connectivity data, e.g., contacts derived from chromatin conformation capture experiments. To assess these visual layouts, we conducted a study comprising 150 participants from Amazon's Mechanical Turk, as well as a second expert study comprising 30 biomedical research scientists. Results: The Mechanical Turk study found that the circular layout was the most accurate and intuitive, while the expert study found that the circular and half-matrix layouts were more accurate than the matrix layout. Discussion: We concluded that the circular layout may be a good default choice for visualizing smaller datasets with relatively few spatial contacts, while, for larger datasets, the half- matrix layout may be a better choice. Our results also demonstrated how crowdsourcing methods could be used to determine which visual layouts are best for addressing specific data challenges in bioinformatics.
ABSTRACT
Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing gene sets, most of which find Gene Ontology (GO) terms that are significantly over-represented in one gene set. However, such tools often return GO terms that are too generic or too few to be informative. Here, we present Martini, an easy-to-use tool for comparing gene sets. Martini is based, not on GO, but on keywords extracted from Medline abstracts; Martini also supports a much wider range of species than comparable tools. To evaluate Martini we created a benchmark based on the human cell cycle, and we tested several comparable tools (CoPub, FatiGO, Marmite and ProfCom). Martini had the best benchmark performance, delivering a more detailed and accurate description of function. Martini also gave best or equal performance with three other datasets (related to Arabidopsis, melanoma and ovarian cancer), suggesting that Martini represents an advance in the automated comparison of gene sets. In agreement with previous studies, our results further suggest that literature-derived keywords are a richer source of gene-function information than GO annotations. Martini is freely available at http://martini.embl.de.
Subject(s)
Genes , Software , Terminology as Topic , Arabidopsis/genetics , Cell Cycle/genetics , Dictionaries as Topic , Genes, Neoplasm , Genes, Plant , Humans , MEDLINE , Melanoma/geneticsABSTRACT
Mevalonate kinase deficiency (MKD) is characterized by recurrent fevers and flares of systemic inflammation, caused by biallelic loss-of-function mutations in MVK. The underlying disease mechanisms and triggers of inflammatory flares are poorly understood because of the lack of in vivo models. We describe genetically modified mice bearing the hypomorphic mutation p.Val377Ile (the commonest variant in patients with MKD) and amorphic, frameshift mutations in Mvk. Compound heterozygous mice recapitulated the characteristic biochemical phenotype of MKD, with increased plasma mevalonic acid and clear buildup of unprenylated GTPases in PBMCs, splenocytes, and bone marrow. The inflammatory response to LPS was enhanced in compound heterozygous mice and treatment with the NLRP3 inflammasome inhibitor MCC950 prevented the elevation of circulating IL-1ß, thus identifying a potential inflammasome target for future therapeutic approaches. Furthermore, lines of mice with a range of deficiencies in mevalonate kinase and abnormal prenylation mirrored the genotype-phenotype relationship in human MKD. Importantly, these mice allowed the determination of a threshold level of residual enzyme activity, below which protein prenylation is impaired. Elevated temperature dramatically but reversibly exacerbated the deficit in the mevalonate pathway and the defective prenylation in vitro and in vivo, highlighting increased body temperature as a likely trigger of inflammatory flares.
Subject(s)
Mevalonate Kinase Deficiency , Animals , Body Temperature , Fever , GTP Phosphohydrolases/genetics , Humans , Inflammasomes/genetics , Inflammasomes/metabolism , Lipopolysaccharides/metabolism , Mevalonate Kinase Deficiency/drug therapy , Mevalonate Kinase Deficiency/genetics , Mevalonate Kinase Deficiency/metabolism , Mevalonic Acid/metabolism , Mice , NLR Family, Pyrin Domain-Containing 3 Protein/genetics , Phosphotransferases (Alcohol Group Acceptor)/genetics , Protein PrenylationABSTRACT
Temporal changes in omics events can now be routinely measured; however, current analysis methods are often inadequate, especially for multiomics experiments. We report a novel analysis method that can infer event ordering at better temporal resolution than the experiment, and integrates omic events into two concise visualizations (event maps and sparklines). Testing our method gave results well-correlated with prior knowledge and indicated it streamlines analysis of time-series data.
Subject(s)
Computational Biology/methods , Proteomics/methods , Algorithms , Computer Simulation , Data Interpretation, Statistical , Software , Spatio-Temporal AnalysisABSTRACT
In this paper, we present a benchmark dataset to evaluate the currently available analysis methods and visualizations for epiproteomic data. The benchmark dataset is a subset of a high-throughput time-series study of phosphoevents occurring upon insulin stimulation. Our dataset is provided in multiple formats for use with four currently available tools. We also provide a file containing the kinase assignments for the sites, as well as a simple kappa model on phosphorylation changes in insulin signalling. A detailed description of the tools, their analysis methods, and the visualizations generated using the input files described here, are discussed in detail in the accompanying review titled "Visualization and analysis of epiproteome dynamics" [1].