Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 94
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 155(5): 1075-87, 2013 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-24210918

RESUMEN

Pervasive transcription of eukaryotic genomes stems to a large extent from bidirectional promoters that synthesize mRNA and divergent noncoding RNA (ncRNA). Here, we show that ncRNA transcription in the yeast S. cerevisiae is globally restricted by early termination that relies on the essential RNA-binding factor Nrd1. Depletion of Nrd1 from the nucleus results in 1,526 Nrd1-unterminated transcripts (NUTs) that originate from nucleosome-depleted regions (NDRs) and can deregulate mRNA synthesis by antisense repression and transcription interference. Transcriptome-wide Nrd1-binding maps reveal divergent NUTs at most promoters and antisense NUTs in most 3' regions of genes. Nrd1 and its partner Nab3 preferentially bind RNA motifs that are depleted in mRNAs and enriched in ncRNAs and some mRNAs whose synthesis is controlled by transcription attenuation. These results define a global mechanism for transcriptome surveillance that selectively terminates ncRNA synthesis to provide promoter directionality and to suppress antisense transcription.


Asunto(s)
ARN de Hongos/genética , ARN no Traducido/genética , Proteínas de Unión al ARN/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Terminación de la Transcripción Genética , Transcriptoma , Regulación hacia Abajo , Proteínas Nucleares/metabolismo , Regiones Promotoras Genéticas , ARN sin Sentido/metabolismo , Saccharomyces cerevisiae/genética
2.
Nat Methods ; 21(1): 28-31, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38049697

RESUMEN

Single-cell ATAC sequencing coverage in regulatory regions is typically binarized as an indicator of open chromatin. Here we show that binarization is an unnecessary step that neither improves goodness of fit, clustering, cell type identification nor batch integration. Fragment counts, but not read counts, should instead be modeled, which preserves quantitative regulatory information. These results have immediate implications for single-cell ATAC sequencing analysis.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cromatina/genética , Análisis de la Célula Individual
3.
Am J Hum Genet ; 110(12): 2056-2067, 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38006880

RESUMEN

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.


Asunto(s)
Empalme Alternativo , Empalme del ARN , Humanos , Empalme Alternativo/genética , Intrones/genética , Empalme del ARN/genética , RNA-Seq , Algoritmos
4.
Mol Syst Biol ; 20(5): 506-520, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38491213

RESUMEN

Codon optimality is a major determinant of mRNA translation and degradation rates. However, whether and through which mechanisms its effects are regulated remains poorly understood. Here we show that codon optimality associates with up to 2-fold change in mRNA stability variations between human tissues, and that its effect is attenuated in tissues with high energy metabolism and amplifies with age. Mathematical modeling and perturbation data through oxygen deprivation and ATP synthesis inhibition reveal that cellular energy variations non-uniformly alter the effect of codon usage. This new mode of codon effect regulation, independent of tRNA regulation, provides a fundamental mechanistic link between cellular energy metabolism and eukaryotic gene expression.


Asunto(s)
Codón , Metabolismo Energético , Estabilidad del ARN , ARN Mensajero , Humanos , Metabolismo Energético/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Codón/genética , Uso de Codones , Biosíntesis de Proteínas , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Adenosina Trifosfato/metabolismo , Regulación de la Expresión Génica
5.
Nat Rev Genet ; 20(7): 389-403, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-30971806

RESUMEN

As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing.


Asunto(s)
Aprendizaje Profundo , Genómica/métodos , Modelos Genéticos , Redes Neurales de la Computación , Secuencia de Bases , Simulación por Computador , Humanos , Aprendizaje Automático Supervisado , Aprendizaje Automático no Supervisado
6.
Nucleic Acids Res ; 51(4): e21, 2023 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-36617985

RESUMEN

Transposon screens are powerful in vivo assays used to identify loci driving carcinogenesis. These loci are identified as Common Insertion Sites (CISs), i.e. regions with more transposon insertions than expected by chance. However, the identification of CISs is affected by biases in the insertion behaviour of transposon systems. Here, we introduce Transmicron, a novel method that differs from previous methods by (i) modelling neutral insertion rates based on chromatin accessibility, transcriptional activity and sequence context and (ii) estimating oncogenic selection for each genomic region using Poisson regression to model insertion counts while controlling for neutral insertion rates. To assess the benefits of our approach, we generated a dataset applying two different transposon systems under comparable conditions. Benchmarking for enrichment of known cancer genes showed improved performance of Transmicron against state-of-the-art methods. Modelling neutral insertion rates allowed for better control of false positives and stronger agreement of the results between transposon systems. Moreover, using Poisson regression to consider intra-sample and inter-sample information proved beneficial in small and moderately-sized datasets. Transmicron is open-source and freely available. Overall, this study contributes to the understanding of transposon biology and introduces a novel approach to use this knowledge for discovering cancer driver genes.


Asunto(s)
Elementos Transponibles de ADN , Neoplasias , Programas Informáticos , Humanos , Secuencia de Bases , Carcinogénesis , Mutagénesis Insercional , Oncogenes , Neoplasias/genética
7.
Bioinformatics ; 39(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36708003

RESUMEN

MOTIVATION: Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data. RESULTS: We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/schulzlab/STARE. CONTACT: marcel.schulz@em.uni-frankfurt.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Regulación de la Expresión Génica , Factores de Transcripción , Humanos , Factores de Transcripción/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Unión Proteica
8.
Mol Genet Metab ; 142(3): 108511, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38878498

RESUMEN

The diagnosis of Mendelian disorders has notably advanced with integration of whole exome and genome sequencing (WES and WGS) in clinical practice. However, challenges in variant interpretation and uncovered variants by WES still leave a substantial percentage of patients undiagnosed. In this context, integrating RNA sequencing (RNA-seq) improves diagnostic workflows, particularly for WES inconclusive cases. Additionally, functional studies are often necessary to elucidate the impact of prioritized variants on gene expression and protein function. Our study focused on three unrelated male patients (P1-P3) with ATP6AP1-CDG (congenital disorder of glycosylation), presenting with intellectual disability and varying degrees of hepatopathy, glycosylation defects, and an initially inconclusive diagnosis through WES. Subsequent RNA-seq was pivotal in identifying the underlying genetic causes in P1 and P2, detecting ATP6AP1 underexpression and aberrant splicing. Molecular studies in fibroblasts confirmed these findings and identified the rare intronic variants c.289-233C > T and c.289-289G > A in P1 and P2, respectively. Trio-WGS also revealed the variant c.289-289G > A in P3, which was a de novo change in both patients. Functional assays expressing the mutant alleles in HAP1 cells demonstrated the pathogenic impact of these variants by reproducing the splicing alterations observed in patients. Our study underscores the role of RNA-seq and WGS in enhancing diagnostic rates for genetic diseases such as CDG, providing new insights into ATP6AP1-CDG molecular bases by identifying the first two deep intronic variants in this X-linked gene. Additionally, our study highlights the need to integrate RNA-seq and WGS, followed by functional validation, in routine diagnostics for a comprehensive evaluation of patients with an unidentified molecular etiology.


Asunto(s)
Intrones , ARN Mensajero , Humanos , Masculino , Intrones/genética , ARN Mensajero/genética , ATPasas de Translocación de Protón Vacuolares/genética , Trastornos Congénitos de Glicosilación/genética , Trastornos Congénitos de Glicosilación/diagnóstico , Trastornos Congénitos de Glicosilación/patología , Mutación , Secuenciación Completa del Genoma , Secuenciación del Exoma , Análisis de Secuencia de ARN , Discapacidad Intelectual/genética , Discapacidad Intelectual/diagnóstico , Discapacidad Intelectual/patología , Niño , Empalme del ARN/genética , Preescolar
9.
Basic Res Cardiol ; 117(1): 6, 2022 02 17.
Artículo en Inglés | MEDLINE | ID: mdl-35175464

RESUMEN

The majority of risk loci identified by genome-wide association studies (GWAS) are in non-coding regions, hampering their functional interpretation. Instead, transcriptome-wide association studies (TWAS) identify gene-trait associations, which can be used to prioritize candidate genes in disease-relevant tissue(s). Here, we aimed to systematically identify susceptibility genes for coronary artery disease (CAD) by TWAS. We trained prediction models of nine CAD-relevant tissues using EpiXcan based on two genetics-of-gene-expression panels, the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) and the Genotype-Tissue Expression (GTEx). Based on these prediction models, we imputed gene expression of respective tissues from individual-level genotype data on 37,997 CAD cases and 42,854 controls for the subsequent gene-trait association analysis. Transcriptome-wide significant association (i.e. P < 3.85e-6) was observed for 114 genes. Of these, 96 resided within previously identified GWAS risk loci and 18 were novel. Stepwise analyses were performed to study their plausibility, biological function, and pathogenicity in CAD, including analyses for colocalization, damaging mutations, pathway enrichment, phenome-wide associations with human data and expression-traits correlations using mouse data. Finally, CRISPR/Cas9-based gene knockdown of two newly identified TWAS genes, RGS19 and KPTN, in a human hepatocyte cell line resulted in reduced secretion of APOB100 and lipids in the cell culture medium. Our CAD TWAS work (i) prioritized candidate causal genes at known GWAS loci, (ii) identified 18 novel genes to be associated with CAD, and iii) suggested potential tissues and pathways of action for these TWAS CAD genes.


Asunto(s)
Enfermedad de la Arteria Coronaria , Estudio de Asociación del Genoma Completo , Animales , Enfermedad de la Arteria Coronaria/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Ratones , Polimorfismo de Nucleótido Simple , Transcriptoma
10.
PLoS Comput Biol ; 17(5): e1008982, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33970899

RESUMEN

The 5' untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5'UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)-a proxy for translation rate-directly from 5'UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5'UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5'UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.


Asunto(s)
Regiones no Traducidas 5' , Aprendizaje Profundo , Ribosomas/metabolismo , Humanos , ARN Mensajero/genética
11.
Int J Mol Sci ; 23(20)2022 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-36293220

RESUMEN

Peroxisomal biogenesis disorders (PBDs) are a heterogeneous group of genetic diseases. Multiple peroxisomal pathways are impaired, and very long chain fatty acids (VLCFA) are the first line biomarkers for the diagnosis. The clinical presentation of PBDs may range from severe, lethal multisystemic disorders to milder, late-onset disease. The vast majority of PBDs belong to Zellweger Spectrum Disordes (ZSDs) and represents a continuum of overlapping clinical symptoms, with Zellweger syndrome being the most severe and Heimler syndrome the less severe disease. Mild clinical conditions frequently present normal or slight biochemical alterations, making the diagnosis of these patients challenging. In the present study we used a combined WES and RNA-seq strategy to diagnose a patient presenting with retinal dystrophy as the main clinical symptom. Results showed the patient was compound heterozygous for mutations in PEX1. VLCFA were normal, but retrospective analysis of lysosphosphatidylcholines (LPC) containing C22:0-C26:0 species was altered. This simple test could avoid the diagnostic odyssey of patients with mild phenotype, such as the individual described here, who was diagnosed very late in adult life. We provide functional data in cell line models that may explain the mild phenotype of the patient by demonstrating the hypomorphic nature of a deep intronic variant altering PEX1 mRNA processing.


Asunto(s)
Sordera , Pérdida Auditiva Sensorineural , Síndrome de Zellweger , Humanos , ATPasas Asociadas con Actividades Celulares Diversas/metabolismo , RNA-Seq , Estudios Retrospectivos , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Síndrome de Zellweger/diagnóstico , Síndrome de Zellweger/genética , Pérdida Auditiva Sensorineural/genética , Biomarcadores , ARN Mensajero , Ácidos Grasos
12.
Am J Hum Genet ; 103(6): 907-917, 2018 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-30503520

RESUMEN

RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.


Asunto(s)
Expresión Génica/genética , Variación Genética/genética , ARN/metabolismo , Análisis de Secuencia de ARN/métodos , Algoritmos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
13.
Am J Hum Genet ; 100(1): 151-159, 2017 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-27989324

RESUMEN

MDH2 encodes mitochondrial malate dehydrogenase (MDH), which is essential for the conversion of malate to oxaloacetate as part of the proper functioning of the Krebs cycle. We report bi-allelic pathogenic mutations in MDH2 in three unrelated subjects presenting with early-onset generalized hypotonia, psychomotor delay, refractory epilepsy, and elevated lactate in the blood and cerebrospinal fluid. Functional studies in fibroblasts from affected subjects showed both an apparently complete loss of MDH2 levels and MDH2 enzymatic activity close to null. Metabolomics analyses demonstrated a significant concomitant accumulation of the MDH substrate, malate, and fumarate, its immediate precursor in the Krebs cycle, in affected subjects' fibroblasts. Lentiviral complementation with wild-type MDH2 cDNA restored MDH2 levels and mitochondrial MDH activity. Additionally, introduction of the three missense mutations from the affected subjects into Saccharomyces cerevisiae provided functional evidence to support their pathogenicity. Disruption of the Krebs cycle is a hallmark of cancer, and MDH2 has been recently identified as a novel pheochromocytoma and paraganglioma susceptibility gene. We show that loss-of-function mutations in MDH2 are also associated with severe neurological clinical presentations in children.


Asunto(s)
Encefalopatías/genética , Ciclo del Ácido Cítrico , Malato Deshidrogenasa/genética , Mutación , Edad de Inicio , Alelos , Secuencia de Aminoácidos , Niño , Preescolar , Ciclo del Ácido Cítrico/genética , Fibroblastos/enzimología , Fibroblastos/metabolismo , Fumaratos/metabolismo , Prueba de Complementación Genética , Humanos , Lactante , Recién Nacido , Malato Deshidrogenasa/química , Malato Deshidrogenasa/metabolismo , Malatos/metabolismo , Masculino , Metabolómica , Modelos Moleculares
14.
Mol Syst Biol ; 15(2): e8513, 2019 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-30777893

RESUMEN

Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein-to-mRNA (PTR) ratios and a quantification of their effects are still lacking. Here, we quantified PTR ratios for 11,575 proteins across 29 human tissues using matched transcriptomes and proteomes. We estimated by regression the contribution of known sequence determinants of protein synthesis and degradation in addition to 45 mRNA and 3 protein sequence motifs that we found by association testing. While PTR ratios span more than 2 orders of magnitude, our integrative model predicts PTR ratios at a median precision of 3.2-fold. A reporter assay provided functional support for two novel UTR motifs, and an immobilized mRNA affinity competition-binding assay identified motif-specific bound proteins for one motif. Moreover, our integrative model led to a new metric of codon optimality that captures the effects of codon frequency on protein synthesis and degradation. Altogether, this study shows that a large fraction of PTR ratio variation in human tissues can be predicted from sequence, and it identifies many new candidate post-transcriptional regulatory elements.


Asunto(s)
Proteínas/genética , Proteoma/genética , Distribución Tisular/genética , Transcriptoma/genética , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Humanos , Espectrometría de Masas/métodos , Proteómica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
15.
Mol Syst Biol ; 15(2): e8503, 2019 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-30777892

RESUMEN

Genome-, transcriptome- and proteome-wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein-level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue-specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.


Asunto(s)
Genoma Humano/genética , Proteoma/genética , Distribución Tisular/genética , Transcriptoma/genética , Regulación de la Expresión Génica/genética , Humanos , Espectrometría de Masas/métodos , Proteómica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
16.
Hum Mutat ; 40(9): 1243-1251, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31070280

RESUMEN

Pathogenic genetic variants often primarily affect splicing. However, it remains difficult to quantitatively predict whether and how genetic variants affect splicing. In 2018, the fifth edition of the Critical Assessment of Genome Interpretation proposed two splicing prediction challenges based on experimental perturbation assays: Vex-seq, assessing exon skipping, and MaPSy, assessing splicing efficiency. We developed a modular modeling framework, MMSplice, the performance of which was among the best on both challenges. Here we provide insights into the modeling assumptions of MMSplice and its individual modules. We furthermore illustrate how MMSplice can be applied in practice for individual genome interpretation, using the MMSplice VEP plugin and the Kipoi variant interpretation plugin, which are directly applicable to VCF files.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Empalme del ARN , Congresos como Asunto , Exones , Predisposición Genética a la Enfermedad , Humanos , Intrones , Modelos Genéticos , Programas Informáticos
17.
Hum Mutat ; 40(9): 1215-1224, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31301154

RESUMEN

Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.


Asunto(s)
Empalme Alternativo , Biología Computacional/métodos , Mutación , Proteínas/genética , Animales , Congresos como Asunto , Aptitud Genética , Humanos , Modelos Genéticos , Homología de Secuencia de Ácido Nucleico
18.
RNA ; 23(11): 1648-1659, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28802259

RESUMEN

The stability of mRNA is one of the major determinants of gene expression. Although a wealth of sequence elements regulating mRNA stability has been described, their quantitative contributions to half-life are unknown. Here, we built a quantitative model for Saccharomyces cerevisiae based on functional mRNA sequence features that explains 59% of the half-life variation between genes and predicts half-life at a median relative error of 30%. The model revealed a new destabilizing 3' UTR motif, ATATTC, which we functionally validated. Codon usage proves to be the major determinant of mRNA stability. Nonetheless, single-nucleotide variations have the largest effect when occurring on 3' UTR motifs or upstream AUGs. Analyzing mRNA half-life data of 34 knockout strains showed that the effect of codon usage not only requires functional decapping and deadenylation, but also the 5'-to-3' exonuclease Xrn1, the nonsense-mediated decay genes, but not no-go decay. Altogether, this study quantitatively delineates the contributions of mRNA sequence features on stability in yeast, reveals their functional dependencies on degradation pathways, and allows accurate prediction of half-life from mRNA sequence.


Asunto(s)
Estabilidad del ARN/genética , ARN de Hongos/genética , ARN de Hongos/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Regiones no Traducidas 3'/genética , Secuencia de Bases , Codón/genética , Codón/metabolismo , Técnicas de Inactivación de Genes , Genes Fúngicos , Semivida , Modelos Biológicos , Degradación de ARNm Mediada por Codón sin Sentido/genética , Iniciación de la Cadena Peptídica Traduccional , Elementos Reguladores de la Transcripción , Schizosaccharomyces/genética , Schizosaccharomyces/metabolismo
19.
Bioinformatics ; 34(8): 1261-1269, 2018 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-29155928

RESUMEN

Motivation: Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results: Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation: Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact: avsec@in.tum.de or gagneur@in.tum.de. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica/métodos , Modelos Genéticos , Redes Neurales de la Computación , Secuencias Reguladoras de Ácidos Nucleicos , ADN , Células Hep G2 , Humanos , Células K562 , Aprendizaje Automático , Unión Proteica , Proteínas/metabolismo , ARN , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos
20.
BMC Bioinformatics ; 19(1): 247, 2018 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-29945559

RESUMEN

BACKGROUND: GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale genomes such as mammalian genomes. RESULTS: Here we present GenoGAM 2.0, a scalable and efficient implementation that is 2 to 3 orders of magnitude faster than the previous version. This is achieved by exploiting the sparsity of the model using the SuperLU direct solver for parameter fitting, and sparse Cholesky factorization together with the sparse inverse subset algorithm for computing standard errors. Furthermore the HDF5 library is employed to store data efficiently on hard drive, reducing memory footprint while keeping I/O low. Whole-genome fits for human ChIP-seq datasets (ca. 300 million parameters) could be obtained in less than 9 hours on a standard 60-core server. GenoGAM 2.0 is implemented as an open source R package and currently available on GitHub. A Bioconductor release of the new version is in preparation. CONCLUSIONS: We have vastly improved the performance of the GenoGAM framework, opening up its application to all types of organisms. Moreover, our algorithmic improvements for fitting large GAMs could be of interest to the statistical community beyond the genomics field.


Asunto(s)
Genómica/métodos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA