Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Mol Cell ; 46(6): 884-92, 2012 Jun 29.
Article in English | MEDLINE | ID: mdl-22749401

ABSTRACT

Alternative splicing plays a key role in the expansion of proteomic and regulatory complexity, yet the functions of the vast majority of differentially spliced exons are not known. In this study, we observe that brain and other tissue-regulated exons are significantly enriched in flexible regions of proteins that likely form conserved interaction surfaces. These proteins participate in significantly more interactions in protein-protein interaction (PPI) networks than other proteins. Using LUMIER, an automated PPI assay, we observe that approximately one-third of analyzed neural-regulated exons affect PPIs. Inclusion of these exons stimulated and repressed different partner interactions at comparable frequencies. This assay further revealed functions of individual exons, including a role for a neural-specific exon in promoting an interaction between Bridging Integrator 1 (Bin1)/Amphiphysin II and Dynamin 2 (Dnm2) that facilitates endocytosis. Collectively, our results provide evidence that regulated alternative exons frequently remodel interactions to establish tissue-dependent PPI networks.


Subject(s)
Alternative Splicing , Protein Interaction Maps , Proteins/metabolism , Adaptor Proteins, Signal Transducing/genetics , Adaptor Proteins, Signal Transducing/metabolism , Binding Sites , Cells, Cultured , Dynamin II/genetics , Dynamin II/metabolism , Exons , HEK293 Cells , Humans , Luciferases, Renilla/genetics , Luciferases, Renilla/metabolism , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Proteins/genetics , Proteomics , Tumor Suppressor Proteins/genetics , Tumor Suppressor Proteins/metabolism
2.
Bioinformatics ; 32(10): 1589-91, 2016 05 15.
Article in English | MEDLINE | ID: mdl-26801957

ABSTRACT

UNLABELLED: ELASPIC is a novel ensemble machine-learning approach that predicts the effects of mutations on protein folding and protein-protein interactions. Here, we present the ELASPIC webserver, which makes the ELASPIC pipeline available through a fast and intuitive interface. The webserver can be used to evaluate the effect of mutations on any protein in the Uniprot database, and allows all predicted results, including modeled wild-type and mutated structures, to be managed and viewed online and downloaded if needed. It is backed by a database which contains improved structural domain definitions, and a list of curated domain-domain interactions for all known proteins, as well as homology models of domains and domain-domain interactions for the human proteome. Homology models for proteins of other organisms are calculated on the fly, and mutations are evaluated within minutes once the homology model is available. AVAILABILITY AND IMPLEMENTATION: The ELASPIC webserver is available online at http://elaspic.kimlab.org CONTACT: pm.kim@utoronto.ca or pi@kimlab.orgSupplementary data: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteome , Humans , Mutation , Protein Binding , Protein Folding , Protein Stability , Software
3.
Bioinformatics ; 32(2): 203-10, 2016 Jan 15.
Article in English | MEDLINE | ID: mdl-26411870

ABSTRACT

MOTIVATION: Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype-phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. RESULTS: Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. AVAILABILITY AND IMPLEMENTATION: JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/∼goldenberg/JBASE/jbase.tar.gz. The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. CONTACT: anna.goldenberg@utoronto.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Epistasis, Genetic , Phenotype , Bayes Theorem , Body Mass Index , Diabetes Mellitus, Type 2/genetics , Genome-Wide Association Study , Genotype , Genotyping Techniques , Humans , Mexico , Waist-Hip Ratio
4.
PLoS Comput Biol ; 9(4): e1003030, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23633940

ABSTRACT

Intrinsically disordered regions have been associated with various cellular processes and are implicated in several human diseases, but their exact roles remain unclear. We previously defined two classes of conserved disordered regions in budding yeast, referred to as "flexible" and "constrained" conserved disorder. In flexible disorder, the property of disorder has been positionally conserved during evolution, whereas in constrained disorder, both the amino acid sequence and the property of disorder have been conserved. Here, we show that flexible and constrained disorder are widespread in the human proteome, and are particularly common in proteins with regulatory functions. Both classes of disordered sequences are highly enriched in regions of proteins that undergo tissue-specific (TS) alternative splicing (AS), but not in regions of proteins that undergo general (i.e., not tissue-regulated) AS. Flexible disorder is more highly enriched in TS alternative exons, whereas constrained disorder is more highly enriched in exons that flank TS alternative exons. These latter regions are also significantly more enriched in potential phosphosites and other short linear motifs associated with cell signaling. We further show that cancer driver mutations are significantly enriched in regions of proteins associated with TS and general AS. Collectively, our results point to distinct roles for TS alternative exons and flanking exons in the dynamic regulation of protein interaction networks in response to signaling activity, and they further suggest that alternatively spliced regions of proteins are often functionally altered by mutations responsible for cancer.


Subject(s)
Alternative Splicing , Proteomics/methods , Algorithms , Amino Acid Motifs , Computational Biology/methods , Evolution, Molecular , Exons , Humans , Muscles/metabolism , Mutation , Neoplasms/metabolism , Phosphorylation , Protein Folding , Protein Interaction Mapping/methods , Protein Interaction Maps , Proteome , Signal Transduction
5.
Bioinformatics ; 26(18): i625-31, 2010 Sep 15.
Article in English | MEDLINE | ID: mdl-20823331

ABSTRACT

MOTIVATION: Recent genomic studies have confirmed that cancer is of utmost phenotypical complexity, varying greatly in terms of subtypes and evolutionary stages. When classifying cancer tissue samples, subnetwork marker approaches have proven to be superior over single gene marker approaches, most importantly in cross-platform evaluation schemes. However, prior subnetwork-based approaches do not explicitly address the great phenotypical complexity of cancer. RESULTS: We explicitly address this and employ density-constrained biclustering to compute subnetwork markers, which reflect pathways being dysregulated in many, but not necessarily all samples under consideration. In breast cancer we achieve substantial improvements over all cross-platform applicable approaches when predicting TP53 mutation status in a well-established non-cross-platform setting. In colon cancer, we raise prediction accuracy in the most difficult instances from 87% to 93% for cancer versus non-cancer and from 83% to (astonishing) 92%, for with versus without liver metastasis, in well-established cross-platform evaluation schemes. AVAILABILITY: Software is available on request.


Subject(s)
Biomarkers, Tumor , Computational Biology/methods , Gene Regulatory Networks , Neoplasms/genetics , Algorithms , Benchmarking , Breast Neoplasms/genetics , Colonic Neoplasms/genetics , Female , Gene Expression Profiling , Genes, p53 , Humans , Neoplasms/classification , Software
6.
Proteomics ; 8(11): 2196-8, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18452226

ABSTRACT

High-throughput experiments, most significantly DNA microarrays, provide us with system-scale profiles. Connecting these data with existing biological networks poses a formidable challenge to uncover facts about a cell's proteome. Studies and tools with this purpose are limited to networks with simple structure, such as protein-protein interaction graphs, or do not go much beyond than simply displaying values on the network. We have built a microarray data analysis tool, named PATIKAmad, which can be used to associate microarray data with the pathway models in mechanistic detail, and provides facilities for visualization, clustering, querying, and navigation of biological graphs related with loaded microarray experiments. PATIKAmad is freely available to noncommercial users as a new module of PATIKAweb at http://web.patika.org.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Algorithms , Cluster Analysis , Computational Biology/methods , Data Interpretation, Statistical , Gene Expression Regulation , Internet , MAP Kinase Signaling System , Oligonucleotide Array Sequence Analysis/instrumentation , Pattern Recognition, Automated , Protein Interaction Mapping , Proteome , Proteomics/methods , Software , User-Computer Interface
7.
Cell Rep ; 12(2): 183-9, 2015 Jul 14.
Article in English | MEDLINE | ID: mdl-26146086

ABSTRACT

Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of "exon skipping" alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.


Subject(s)
Alternative Splicing , Proteins/metabolism , Supervised Machine Learning , Area Under Curve , Exons , Humans , Protein Isoforms/chemistry , Protein Isoforms/genetics , Protein Isoforms/metabolism , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , ROC Curve
8.
PLoS One ; 9(9): e107353, 2014.
Article in English | MEDLINE | ID: mdl-25243403

ABSTRACT

Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.


Subject(s)
Mutation , Protein Folding , Protein Stability , Proteins/metabolism , Artificial Intelligence , Databases, Protein , Humans , Models, Molecular , Protein Binding , Protein Conformation , Sequence Analysis, Protein , Software
9.
Mol Biosyst ; 8(1): 185-93, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22101230

ABSTRACT

Disordered regions within proteins have increasingly been associated with various cellular functions. Identifying the specific roles played by disorder in these functions has proved difficult. However, the development of reliable prediction algorithms has expanded the study of disorder from a few anecdotal examples to a proteome-wide scale. Moreover, the recent omics revolution has provided the sequences of numerous organisms as well as thousands of genome-wide data sets including several types of interactomes. Here, we review the literature regarding genome-wide studies of disorder and examine how these studies give rise to new characterizations and categories of this elusive phenomenon.


Subject(s)
Protein Folding , Proteins/chemistry , Proteins/metabolism , Proteomics , Animals , Evolution, Molecular , Gene Regulatory Networks , Humans , Protein Binding
10.
Science ; 338(6114): 1587-93, 2012 Dec 21.
Article in English | MEDLINE | ID: mdl-23258890

ABSTRACT

How species with similar repertoires of protein-coding genes differ so markedly at the phenotypic level is poorly understood. By comparing organ transcriptomes from vertebrate species spanning ~350 million years of evolution, we observed significant differences in alternative splicing complexity between vertebrate lineages, with the highest complexity in primates. Within 6 million years, the splicing profiles of physiologically equivalent organs diverged such that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species-specific splicing patterns are cis-directed. However, a subset of pronounced splicing changes are predicted to remodel protein interactions involving trans-acting regulators. These events likely further contributed to the diversification of splicing and other transcriptomic changes that underlie phenotypic differences among vertebrate species.


Subject(s)
Alternative Splicing , Evolution, Molecular , Transcriptome , Vertebrates/genetics , Animals , Biological Evolution , Chickens/genetics , Exons , Introns , Lizards/genetics , Mice/genetics , Mice, Inbred C57BL/genetics , Opossums/genetics , Phenotype , Platypus/genetics , Primates/genetics , RNA Splice Sites , Regulatory Sequences, Ribonucleic Acid , Species Specificity , Xenopus/genetics
11.
PLoS One ; 5(10): e13348, 2010 Oct 25.
Article in English | MEDLINE | ID: mdl-21049092

ABSTRACT

BACKGROUND: Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented. METHODOLOGY/PRINCIPAL FINDINGS: We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples. CONCLUSION/SIGNIFICANCE: We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets. AVAILABILITY: Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip.


Subject(s)
Computational Biology , Gene Regulatory Networks
SELECTION OF CITATIONS
SEARCH DETAIL