Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
Add more filters

Publication year range
1.
Bioinformatics ; 34(13): i429-i437, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29949959

ABSTRACT

Motivation: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3' acceptor site conditional on a fixed upstream 5' donor site or the choice of a 5' donor site conditional on a fixed 3' acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. Results: COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R2 of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity. Availability and implementation: Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Alternative Splicing , Deep Learning , RNA Splice Sites , Sequence Analysis, RNA/methods , Computational Biology/methods , Humans , Models, Genetic , Probability , Software
2.
Bioinformatics ; 34(17): 2889-2898, 2018 09 01.
Article in English | MEDLINE | ID: mdl-29648582

ABSTRACT

Motivation: Processing of transcripts at the 3'-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3'-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. Results: Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3'-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3' untranslated region of the human genome given only its genomic sequence. We demonstrate the model's broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3' untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model's predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Polyadenylation , 3' Untranslated Regions , Gene Expression Regulation , Genome, Human , Genomics , Humans , Poly A
3.
Nature ; 498(7453): 241-5, 2013 Jun 13.
Article in English | MEDLINE | ID: mdl-23739326

ABSTRACT

Previous investigations of the core gene regulatory circuitry that controls the pluripotency of embryonic stem (ES) cells have largely focused on the roles of transcription, chromatin and non-coding RNA regulators. Alternative splicing represents a widely acting mode of gene regulation, yet its role in regulating ES-cell pluripotency and differentiation is poorly understood. Here we identify the muscleblind-like RNA binding proteins, MBNL1 and MBNL2, as conserved and direct negative regulators of a large program of cassette exon alternative splicing events that are differentially regulated between ES cells and other cell types. Knockdown of MBNL proteins in differentiated cells causes switching to an ES-cell-like alternative splicing pattern for approximately half of these events, whereas overexpression of MBNL proteins in ES cells promotes differentiated-cell-like alternative splicing patterns. Among the MBNL-regulated events is an ES-cell-specific alternative splicing switch in the forkhead family transcription factor FOXP1 that controls pluripotency. Consistent with a central and negative regulatory role for MBNL proteins in pluripotency, their knockdown significantly enhances the expression of key pluripotency genes and the formation of induced pluripotent stem cells during somatic cell reprogramming.


Subject(s)
Alternative Splicing , Cellular Reprogramming , DNA-Binding Proteins/metabolism , Embryonic Stem Cells/cytology , Embryonic Stem Cells/metabolism , RNA-Binding Proteins/metabolism , Alternative Splicing/genetics , Amino Acid Motifs , Animals , Cell Differentiation/genetics , Cell Line , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/deficiency , DNA-Binding Proteins/genetics , Fibroblasts/cytology , Fibroblasts/metabolism , Forkhead Transcription Factors/metabolism , Gene Knockdown Techniques , HEK293 Cells , HeLa Cells , Humans , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Kinetics , Mice , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics , Repressor Proteins/metabolism
4.
Nature ; 499(7457): 172-7, 2013 Jul 11.
Article in English | MEDLINE | ID: mdl-23846655

ABSTRACT

RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.


Subject(s)
Gene Expression Regulation/genetics , Nucleotide Motifs/genetics , RNA-Binding Proteins/metabolism , Autistic Disorder/genetics , Base Sequence , Binding Sites/genetics , Conserved Sequence/genetics , Eukaryotic Cells/metabolism , Humans , Molecular Sequence Data , Protein Structure, Tertiary/genetics , RNA Splicing Factors , RNA Stability/genetics , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/genetics
5.
Crit Rev Biochem Mol Biol ; 51(2): 102-9, 2016.
Article in English | MEDLINE | ID: mdl-26806341

ABSTRACT

High Content Screening (HCS) technologies that combine automated fluorescence microscopy with high throughput biotechnology have become powerful systems for studying cell biology and drug screening. These systems can produce more than 100 000 images per day, making their success dependent on automated image analysis. In this review, we describe the steps involved in quantifying microscopy images and different approaches for each step. Typically, individual cells are segmented from the background using a segmentation algorithm. Each cell is then quantified by extracting numerical features, such as area and intensity measurements. As these feature representations are typically high dimensional (>500), modern machine learning algorithms are used to classify, cluster and visualize cells in HCS experiments. Machine learning algorithms that learn feature representations, in addition to the classification or clustering task, have recently advanced the state of the art on several benchmarking tasks in the computer vision community. These techniques have also recently been applied to HCS image analysis.


Subject(s)
Image Processing, Computer-Assisted , Microscopy, Fluorescence , Algorithms , Biotechnology , Machine Learning , Software , Vision, Ocular
6.
Mol Syst Biol ; 13(4): 924, 2017 04 18.
Article in English | MEDLINE | ID: mdl-28420678

ABSTRACT

Existing computational pipelines for quantitative analysis of high-content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone-arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open-source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high-content microscopy data.


Subject(s)
Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/ultrastructure , Systems Biology/methods , Machine Learning , Microscopy , Neural Networks, Computer , Saccharomyces cerevisiae/metabolism
7.
Bioinformatics ; 32(12): i52-i59, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27307644

ABSTRACT

MOTIVATION: High-content screening (HCS) technologies have enabled large scale imaging experiments for studying cell biology and for drug screening. These systems produce hundreds of thousands of microscopy images per day and their utility depends on automated image analysis. Recently, deep learning approaches that learn feature representations directly from pixel intensity values have dominated object recognition challenges. These tasks typically have a single centered object per image and existing models are not directly applicable to microscopy datasets. Here we develop an approach that combines deep convolutional neural networks (CNNs) with multiple instance learning (MIL) in order to classify and segment microscopy images using only whole image level annotations. RESULTS: We introduce a new neural network architecture that uses MIL to simultaneously classify and segment microscopy images with populations of cells. We base our approach on the similarity between the aggregation function used in MIL and pooling layers used in CNNs. To facilitate aggregating across large numbers of instances in CNN feature maps we present the Noisy-AND pooling function, a new MIL operator that is robust to outliers. Combining CNNs with MIL enables training CNNs using whole microscopy images with image level labels. We show that training end-to-end MIL CNNs outperforms several previous methods on both mammalian and yeast datasets without requiring any segmentation steps. AVAILABILITY AND IMPLEMENTATION: Torch7 implementation available upon request. CONTACT: oren.kraus@mail.utoronto.ca.


Subject(s)
Image Interpretation, Computer-Assisted , Machine Learning , Microscopy , Algorithms , Humans , Neural Networks, Computer , Yeasts/cytology
8.
Nature ; 465(7294): 53-9, 2010 May 06.
Article in English | MEDLINE | ID: mdl-20445623

ABSTRACT

Alternative splicing has a crucial role in the generation of biological complexity, and its misregulation is often involved in human disease. Here we describe the assembly of a 'splicing code', which uses combinations of hundreds of RNA features to predict tissue-dependent changes in alternative splicing for thousands of exons. The code determines new classes of splicing patterns, identifies distinct regulatory programs in different tissues, and identifies mutation-verified regulatory sequences. Widespread regulatory strategies are revealed, including the use of unexpectedly large combinations of features, the establishment of low exon inclusion levels that are overcome by features in specific tissues, the appearance of features deeper into introns than previously appreciated, and the modulation of splice variant levels by transcript structure characteristics. The code detected a class of exons whose inclusion silences expression in adult tissues by activating nonsense-mediated messenger RNA decay, but whose exclusion promotes expression during embryogenesis. The code facilitates the discovery and detailed characterization of regulated alternative splicing events on a genome-wide scale.


Subject(s)
Alternative Splicing/genetics , Gene Expression Regulation , Genetic Code/genetics , Models, Genetic , RNA, Messenger/metabolism , Animals , Gene Silencing , Humans , Mice , Reproducibility of Results
9.
Bioinformatics ; 30(12): i121-9, 2014 Jun 15.
Article in English | MEDLINE | ID: mdl-24931975

ABSTRACT

MOTIVATION: Alternative splicing (AS) is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on AS. METHODS: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters. RESULTS: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting AS patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.


Subject(s)
Alternative Splicing , Artificial Intelligence , Algorithms , Animals , Bayes Theorem , Genomics/methods , Humans , Mice , Neural Networks, Computer , Sequence Analysis, RNA
10.
Bioinformatics ; 29(7): 821-9, 2013 Apr 01.
Article in English | MEDLINE | ID: mdl-23419374

ABSTRACT

MOTIVATION: Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions. RESULTS: The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study. AVAILABILITY: Our algorithm is implemented in Java and is freely available for academic use from http://genes.toronto.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Protein Processing, Post-Translational , Tandem Mass Spectrometry , Bayes Theorem , Cluster Analysis , Fungal Proteins/metabolism , Humans , Phosphopeptides/chemistry , Protein Interaction Mapping , Proteome/metabolism , Proteomics/methods , Statistics, Nonparametric
11.
Nat Genet ; 37(9): 991-6, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16127451

ABSTRACT

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.


Subject(s)
Computational Biology , DNA, Complementary/chemistry , Databases as Topic , Exons/genetics , Genome , Transcription, Genetic , Algorithms , Animals , Gene Expression Profiling , Humans , Mice , Microarray Analysis , RNA, Messenger/chemistry , RNA, Messenger/metabolism
12.
BMC Bioinformatics ; 13 Suppl 6: S11, 2012 Apr 19.
Article in English | MEDLINE | ID: mdl-22537040

ABSTRACT

Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches. We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.


Subject(s)
Alternative Splicing , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Exons , Gene Expression Profiling , HeLa Cells , Humans , Models, Statistical , Reverse Transcriptase Polymerase Chain Reaction
13.
Bioinformatics ; 27(6): 797-806, 2011 Mar 15.
Article in English | MEDLINE | ID: mdl-21258065

ABSTRACT

MOTIVATION: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called 'blind') PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments. RESULTS: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We show that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we show that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outperforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis. AVAILABILITY: Our algorithm is implemented in Matlab and is freely available for academic use. The software is available online from http://genes.toronto.edu.


Subject(s)
Artificial Intelligence , Computational Biology/methods , Protein Processing, Post-Translational , Tandem Mass Spectrometry , Algorithms , Amino Acid Sequence , Bayes Theorem , Cluster Analysis , Models, Statistical , Proteomics/methods , Sequence Analysis, Protein/methods , Software
14.
Bioinformatics ; 27(18): 2554-62, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21803804

ABSTRACT

MOTIVATION: Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a 'splicing code' that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers. METHODS: We formulate the assembly of a splicing code as a problem of statistical inference and introduce a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features. RESULTS: Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (http://genes.toronto.edu/wasp), we benchmarked several methods. Our method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified using our method. CONTACT: frey@psi.toronto.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Alternative Splicing/genetics , RNA Isoforms/genetics , RNA/genetics , Algorithms , Animals , Base Sequence , Bayes Theorem , Exons , Gene Expression , Gene Expression Regulation , Humans , Mice , Models, Genetic , RNA Splicing , Transcription, Genetic
15.
Bioinformatics ; 26(12): i325-33, 2010 Jun 15.
Article in English | MEDLINE | ID: mdl-20529924

ABSTRACT

MOTIVATION: Transcripts from approximately 95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases. RESULTS: We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Alternative Splicing/genetics , Models, Statistical , Algorithms , Exons , Gene Expression Profiling , RNA Splicing
16.
Nat Methods ; 4(12): 1045-9, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18026111

ABSTRACT

We demonstrate that paired expression profiles of microRNAs (miRNAs) and mRNAs can be used to identify functional miRNA-target relationships with high precision. We used a Bayesian data analysis algorithm, GenMiR++, to identify a network of 1,597 high-confidence target predictions for 104 human miRNAs, which was supported by RNA expression data across 88 tissues and cell types, sequence complementarity and comparative genomics data. We experimentally verified our predictions by investigating the result of let-7b downregulation in retinoblastoma using quantitative reverse transcriptase (RT)-PCR and microarray profiling: some of our verified let-7b targets include CDC25A and BCL7A. Compared to sequence-based predictions, our high-scoring GenMiR++ predictions had much more consistent Gene Ontology annotations and were more accurate predictors of which mRNA levels respond to changes in let-7b levels.


Subject(s)
Gene Expression Profiling/methods , Gene Targeting/methods , MicroRNAs/genetics , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, RNA/methods , Base Sequence , Humans , Molecular Sequence Data
17.
NPJ Genom Med ; 5: 16, 2020.
Article in English | MEDLINE | ID: mdl-32284880

ABSTRACT

Wilson disease is a recessive genetic disorder caused by pathogenic loss-of-function variants in the ATP7B gene. It is characterized by disrupted copper homeostasis resulting in liver disease and/or neurological abnormalities. The variant NM_000053.3:c.1934T > G (Met645Arg) has been reported as compound heterozygous, and is highly prevalent among Wilson disease patients of Spanish descent. Accordingly, it is classified as pathogenic by leading molecular diagnostic centers. However, functional studies suggest that the amino acid change does not alter protein function, leading one ClinVar submitter to question its pathogenicity. Here, we used a minigene system and gene-edited HepG2 cells to demonstrate that c.1934T > G causes ~70% skipping of exon 6. Exon 6 skipping results in frameshift and stop-gain, leading to loss of ATP7B function. The elucidation of the mechanistic effect for this variant resolves any doubt about its pathogenicity and enables the development of genetic medicines for restoring correct splicing.

18.
Trends Genet ; 21(2): 73-7, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15661351

ABSTRACT

In this article, we provide evidence that a frequent source of diversity between mammalian transcripts occurs as a consequence of species-specific alternative splicing (AS) of conserved exons. Using a highly predictive computational method, we estimate that >11% of human and mouse cassette alternative exons undergo skipping in one species but constitutively splicing in the other. These species-specific AS events are predicted to modify conserved domains in proteins more frequently than other classes of AS events. The results thus provide evidence that species-specific AS of conserved exons constitutes an additional potential source of complexity and species-specific differences between mammals.


Subject(s)
Alternative Splicing , Animals , Exons , Expressed Sequence Tags , Genome , Humans , Mice , Models, Genetic , Protein Structure, Tertiary , Software , Species Specificity
19.
Nat Biotechnol ; 36(9): 829-838, 2018 10.
Article in English | MEDLINE | ID: mdl-30188539

ABSTRACT

Deep learning is beginning to impact biological research and biomedical applications as a result of its ability to integrate vast datasets, learn arbitrarily complex relationships and incorporate existing knowledge. Already, deep learning models can predict, with varying degrees of success, how genetic variation alters cellular processes involved in pathogenesis, which small molecules will modulate the activity of therapeutically relevant proteins, and whether radiographic images are indicative of disease. However, the flexibility of deep learning creates new challenges in guaranteeing the performance of deployed systems and in establishing trust with stakeholders, clinicians and regulators, who require a rationale for decision making. We argue that these challenges will be overcome using the same flexibility that created them; for example, by training deep models so that they can output a rationale for their predictions. Significant research in this direction will be needed to realize the full potential of deep learning in biomedicine.


Subject(s)
Deep Learning , Algorithms , Humans
20.
J Comput Biol ; 14(5): 550-63, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17683260

ABSTRACT

MicroRNAs (miRNAs) regulate a large proportion of mammalian genes by hybridizing to targeted messenger RNAs (mRNAs) and down-regulating their translation into protein. Although much work has been done in the genome-wide computational prediction of miRNA genes and their target mRNAs, an open question is how to efficiently obtain functional miRNA targets from a large number of candidate miRNA targets predicted by existing computational algorithms. In this paper, we propose a novel Bayesian model and learning algorithm, GenMiR++ (Generative model for miRNA regulation), that accounts for patterns of gene expression using miRNA expression data and a set of candidate miRNA targets. A set of high-confidence functional miRNA targets are then obtained from the data using a Bayesian learning algorithm. Our model scores 467 high-confidence miRNA targets out of 1,770 targets obtained from TargetScanS in mouse at a false detection rate of 2.5%: several confirmed miRNA targets appear in our high-confidence set, such as the interactions between miR-92 and the signal transduction gene MAP2K4, as well as the relationship between miR-16 and BCL2, an anti-apoptotic gene which has been implicated in chronic lymphocytic leukemia. We present results on the robustness of our model showing that our learning algorithm is not sensitive to various perturbations of the data. Our high-confidence targets represent a significant increase in the number of miRNA targets and represent a starting point for a global understanding of gene regulation.


Subject(s)
Bayes Theorem , Gene Expression Profiling , Gene Targeting , MicroRNAs/genetics , Models, Genetic , Sequence Analysis, DNA , Sequence Analysis, RNA , Animals , Gene Expression Profiling/trends , Gene Targeting/trends , Humans , Sequence Analysis, DNA/trends , Sequence Analysis, RNA/trends
SELECTION OF CITATIONS
SEARCH DETAIL