Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 51(2): 712-727, 2023 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-36537210

RESUMO

Various genetic diseases associated with microcephaly and developmental defects are due to pathogenic variants in the U4atac small nuclear RNA (snRNA), a component of the minor spliceosome essential for the removal of U12-type introns from eukaryotic mRNAs. While it has been shown that a few RNU4ATAC mutations result in impaired binding of essential protein components, the molecular defects of the vast majority of variants are still unknown. Here, we used lymphoblastoid cells derived from RNU4ATAC compound heterozygous (g.108_126del;g.111G>A) twin patients with MOPD1 phenotypes to analyze the molecular consequences of the mutations on small nuclear ribonucleoproteins (snRNPs) formation and on splicing. We found that the U4atac108_126del mutant is unstable and that the U4atac111G>A mutant as well as the minor di- and tri-snRNPs are present at reduced levels. Our results also reveal the existence of 3'-extended snRNA transcripts in patients' cells. Moreover, we show that the mutant cells have alterations in splicing of INTS7 and INTS10 minor introns, contain lower levels of the INTS7 and INTS10 proteins and display changes in the assembly of Integrator subunits. Altogether, our results show that compound heterozygous g.108_126del;g.111G>A mutations induce splicing defects and affect the homeostasis and function of the Integrator complex.


Assuntos
Ribonucleoproteínas Nucleares Pequenas , Spliceossomos , Spliceossomos/genética , Spliceossomos/metabolismo , Ribonucleoproteínas Nucleares Pequenas/genética , Mutação , Íntrons/genética , Splicing de RNA/genética , RNA Nuclear Pequeno/metabolismo , Homeostase/genética
2.
RNA ; 25(9): 1130-1149, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31175170

RESUMO

Minor intron splicing plays a central role in human embryonic development and survival. Indeed, biallelic mutations in RNU4ATAC, transcribed into the minor spliceosomal U4atac snRNA, are responsible for three rare autosomal recessive multimalformation disorders named Taybi-Linder (TALS/MOPD1), Roifman (RFMN), and Lowry-Wood (LWS) syndromes, which associate numerous overlapping signs of varying severity. Although RNA-seq experiments have been conducted on a few RFMN patient cells, none have been performed in TALS, and more generally no in-depth transcriptomic analysis of the ∼700 human genes containing a minor (U12-type) intron had been published as yet. We thus sequenced RNA from cells derived from five skin, three amniotic fluid, and one blood biosamples obtained from seven unrelated TALS cases and from age- and sex-matched controls. This allowed us to describe for the first time the mRNA expression and splicing profile of genes containing U12-type introns, in the context of a functional minor spliceosome. Concerning RNU4ATAC-mutated patients, we show that as expected, they display distinct U12-type intron splicing profiles compared to controls, but that rather unexpectedly mRNA expression levels are mostly unchanged. Furthermore, although U12-type intron missplicing concerns most of the expressed U12 genes, the level of U12-type intron retention is surprisingly low in fibroblasts and amniocytes, and much more pronounced in blood cells. Interestingly, we found several occurrences of introns that can be spliced using either U2, U12, or a combination of both types of splice site consensus sequences, with a shift towards splicing using preferentially U2 sites in TALS patients' cells compared to controls.


Assuntos
Nanismo/genética , Retardo do Crescimento Fetal/genética , Microcefalia/genética , Osteocondrodisplasias/genética , Splicing de RNA/genética , Transcriptoma/genética , Adulto , Idoso , Sequência de Bases/genética , Pré-Escolar , Sequência Consenso/genética , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Lactente , Íntrons/genética , Masculino , Pessoa de Meia-Idade , RNA/genética , RNA Mensageiro/genética , RNA Nuclear Pequeno/genética , Spliceossomos/genética , Adulto Jovem
3.
PLoS Genet ; 14(11): e1007758, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30419019

RESUMO

Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient-experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa-along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.


Assuntos
Genoma Bacteriano , Estudo de Associação Genômica Ampla/métodos , Gráficos por Computador , DNA Bacteriano/genética , Bases de Dados Genéticas , Farmacorresistência Bacteriana/genética , Variação Genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Sequências Repetitivas Dispersas , Modelos Genéticos , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/genética , Fenótipo , Pseudomonas aeruginosa/efeitos dos fármacos , Pseudomonas aeruginosa/genética , Análise de Sequência de DNA , Software , Staphylococcus aureus/efeitos dos fármacos , Staphylococcus aureus/genética
4.
Nucleic Acids Res ; 44(19): e148, 2016 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-27458203

RESUMO

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.


Assuntos
Sequência de Bases , Genoma , Polimorfismo de Nucleotídeo Único , Análise de Sequência de RNA , Algoritmos , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Marcadores Genéticos , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Transcriptoma
5.
Nucleic Acids Res ; 43(2): e11, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25404127

RESUMO

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism.


Assuntos
Técnicas de Genotipagem/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Animais , Cromossomos Humanos Par 1 , Escherichia coli/genética , Genômica/métodos , Humanos , Ixodes/genética , Camundongos , Camundongos Endogâmicos C57BL , Saccharomyces cerevisiae/genética
7.
Genome Res ; 22(7): 1231-42, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22588898

RESUMO

Chimeric RNAs comprise exons from two or more different genes and have the potential to encode novel proteins that alter cellular phenotypes. To date, numerous putative chimeric transcripts have been identified among the ESTs isolated from several organisms and using high throughput RNA sequencing. The few corresponding protein products that have been characterized mostly result from chromosomal translocations and are associated with cancer. Here, we systematically establish that some of the putative chimeric transcripts are genuinely expressed in human cells. Using high throughput RNA sequencing, mass spectrometry experimental data, and functional annotation, we studied 7424 putative human chimeric RNAs. We confirmed the expression of 175 chimeric RNAs in 16 human tissues, with an abundance varying from 0.06 to 17 RPKM (Reads Per Kilobase per Million mapped reads). We show that these chimeric RNAs are significantly more tissue-specific than non-chimeric transcripts. Moreover, we present evidence that chimeras tend to incorporate highly expressed genes. Despite the low expression level of most chimeric RNAs, we show that 12 novel chimeras are translated into proteins detectable in multiple shotgun mass spectrometry experiments. Furthermore, we confirm the expression of three novel chimeric proteins using targeted mass spectrometry. Finally, based on our functional annotation of exon organization and preserved domains, we discuss the potential features of chimeric proteins with illustrative examples and suggest that chimeras significantly exploit signal peptides and transmembrane domains, which can alter the cellular localization of cognate proteins. Taken together, these findings establish that some chimeric RNAs are translated into potentially functional proteins in humans.


Assuntos
Genoma Humano , Proteínas Mutantes Quiméricas/genética , Biossíntese de Proteínas , Sequência de Aminoácidos , Membrana Celular/genética , Membrana Celular/metabolismo , Bases de Dados de Ácidos Nucleicos , Éxons , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Espectrometria de Massas/métodos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Proteínas Mutantes Quiméricas/metabolismo , Especificidade de Órgãos , Sinais Direcionadores de Proteínas , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteômica/métodos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA/métodos , Relação Estrutura-Atividade
8.
Bioinformatics ; 30(1): 61-70, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24167155

RESUMO

MOTIVATION: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. RESULTS: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. AVAILABILITY AND IMPLEMENTATION: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.


Assuntos
Cádmio/farmacologia , Metabolômica/métodos , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Ativação Enzimática , Glutationa/biossíntese
9.
Nucleic Acids Res ; 41(Database issue): D142-51, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23143107

RESUMO

Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, 'Junction Search' screens through the RNA-seq reads found at the chimeras' junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.


Assuntos
Bases de Dados Genéticas , Proteínas Mutantes Quiméricas/genética , RNA/química , Animais , Pontos de Quebra do Cromossomo , Gráficos por Computador , Drosophila/genética , Fusão Gênica , Humanos , Internet , Camundongos , Proteínas Mutantes Quiméricas/metabolismo , Neoplasias/genética , RNA/metabolismo , Análise de Sequência de RNA
10.
Nucleic Acids Res ; 40(20): 10073-83, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22962361

RESUMO

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood-mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common--and currently indispensable--technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.


Assuntos
Simulação por Computador , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Hidrólise , RNA/metabolismo
11.
BMC Genomics ; 14: 309, 2013 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-23651581

RESUMO

BACKGROUND: Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. RESULTS: We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. CONCLUSION: In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.


Assuntos
Genoma Arqueal/genética , Genoma Bacteriano/genética , Instabilidade Genômica , Modelos Genéticos , Especificidade da Espécie
12.
BMC Bioinformatics ; 13 Suppl 6: S5, 2012 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-22537044

RESUMO

BACKGROUND: In this paper, we address the problem of identifying and quantifying polymorphisms in RNA-seq data when no reference genome is available, without assembling the full transcripts. Based on the fundamental idea that each polymorphism corresponds to a recognisable pattern in a De Bruijn graph constructed from the RNA-seq reads, we propose a general model for all polymorphisms in such graphs. We then introduce an exact algorithm, called KISSPLICE, to extract alternative splicing events. RESULTS: We show that KISSPLICE enables to identify more correct events than general purpose transcriptome assemblers. Additionally, on a 71 M reads dataset from human brain and liver tissues, KISSPLICE identified 3497 alternative splicing events, out of which 56% are not present in the annotations, which confirms recent estimates showing that the complexity of alternative splicing has been largely underestimated so far. CONCLUSIONS: We propose new models and algorithms for the detection of polymorphism in RNA-seq data. This opens the way to a new kind of studies on large HTS RNA-seq datasets, where the focus is not the global reconstruction of full-length transcripts, but local assembly of polymorphic regions. KISSPLICE is available for download at http://alcovna.genouest.org/kissplice/.


Assuntos
Algoritmos , Processamento Alternativo , Modelos Estatísticos , Análise de Sequência de RNA , Genoma , Humanos , Polimorfismo de Nucleotídeo Único , Padrões de Referência , Sequências de Repetição em Tandem , Transcriptoma
13.
BMC Genomics ; 13: 438, 2012 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-22938206

RESUMO

BACKGROUND: A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. RESULTS: We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. CONCLUSION: Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.


Assuntos
Bactérias/genética , Bactérias/metabolismo , Evolução Molecular , Genoma Bacteriano/genética , Redes e Vias Metabólicas/genética , Simbiose/genética , Modelos Genéticos , Especificidade da Espécie
14.
BMC Genomics ; 12: 303, 2011 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-21663614

RESUMO

BACKGROUND: Folding and intermingling of chromosomes has the potential of bringing close to each other loci that are very distant genomically or even on different chromosomes. On the other hand, genomic rearrangements also play a major role in the reorganisation of loci proximities. Whether the same loci are involved in both mechanisms has been studied in the case of somatic rearrangements, but never from an evolutionary standpoint. RESULTS: In this paper, we analysed the correlation between two datasets: (i) whole-genome chromatin contact data obtained in human cells using the Hi-C protocol; and (ii) a set of breakpoint regions resulting from evolutionary rearrangements which occurred since the split of the human and mouse lineages. Surprisingly, we found that two loci distant in the human genome but adjacent in the mouse genome are significantly more often observed in close proximity in the human nucleus than expected. Importantly, we show that this result holds for loci located on the same chromosome regardless of the genomic distance separating them, and the signal is stronger in gene-rich and open-chromatin regions. CONCLUSIONS: These findings strongly suggest that part of the 3D organisation of chromosomes may be conserved across very large evolutionary distances. To characterise this phenomenon, we propose to use the notion of spatial synteny which generalises the notion of genomic synteny to the 3D case.


Assuntos
Pontos de Quebra do Cromossomo , Evolução Molecular , Sintenia/genética , Animais , Cromatina/genética , Loci Gênicos/genética , Genoma Humano/genética , Genômica , Humanos , Camundongos
15.
NAR Genom Bioinform ; 2(4): lqaa095, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33575639

RESUMO

Influenza A viruses (IAVs) use diverse mechanisms to interfere with cellular gene expression. Although many RNA-seq studies have documented IAV-induced changes in host mRNA abundance, few were designed to allow an accurate quantification of changes in host mRNA splicing. Here, we show that IAV infection of human lung cells induces widespread alterations of cellular splicing, with an overall increase in exon inclusion and decrease in intron retention. Over half of the mRNAs that show differential splicing undergo no significant changes in abundance or in their 3' end termination site, suggesting that IAVs can specifically manipulate cellular splicing. Among a randomly selected subset of 21 IAV-sensitive alternative splicing events, most are specific to IAV infection as they are not observed upon infection with VSV, induction of interferon expression or induction of an osmotic stress. Finally, the analysis of splicing changes in RED-depleted cells reveals a limited but significant overlap with the splicing changes in IAV-infected cells. This observation suggests that hijacking of RED by IAVs to promote splicing of the abundant viral NS1 mRNAs could partially divert RED from its target mRNAs. All our RNA-seq datasets and analyses are made accessible for browsing through a user-friendly Shiny interface (http://virhostnet.prabi.fr:3838/shinyapps/flu-splicing or https://github.com/cbenoitp/flu-splicing).

16.
PLoS One ; 15(7): e0235655, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32628740

RESUMO

Biallelic variants in RNU4ATAC, a non-coding gene transcribed into the minor spliceosome component U4atac snRNA, are responsible for three rare recessive developmental diseases, namely Taybi-Linder/MOPD1, Roifman and Lowry-Wood syndromes. Next-generation sequencing of clinically heterogeneous cohorts (children with either a suspected genetic disorder or a congenital microcephaly) recently identified mutations in this gene, illustrating how profoundly these technologies are modifying genetic testing and assessment. As RNU4ATAC has a single non-coding exon, the bioinformatic prediction algorithms assessing the effect of sequence variants on splicing or protein function are irrelevant, which makes variant interpretation challenging to molecular diagnostic laboratories. In order to facilitate and improve clinical diagnostic assessment and genetic counseling, we present i) an update of the previously reported RNU4ATAC mutations and an analysis of the genetic variations affecting this gene using the Genome Aggregation Database (gnomAD) resource; ii) the pathogenicity prediction performances of scores computed based on an RNA structure prediction tool and of those produced by the Combined Annotation Dependent Depletion tool for the 285 RNU4ATAC variants identified in patients or in large-scale sequencing projects; iii) a method, based on a cellular assay, that allows to measure the effect of RNU4ATAC variants on splicing efficiency of a minor (U12-type) reporter intron. Lastly, the concordance of bioinformatic predictions and cellular assay results was investigated.


Assuntos
RNA Nuclear Pequeno/metabolismo , Spliceossomos/metabolismo , Criança , Bases de Dados Genéticas , Nanismo/genética , Nanismo/patologia , Retardo do Crescimento Fetal/genética , Retardo do Crescimento Fetal/patologia , Fibroblastos/citologia , Fibroblastos/metabolismo , Variação Genética , Humanos , Microcefalia/genética , Microcefalia/patologia , Conformação de Ácido Nucleico , Osteocondrodisplasias/genética , Osteocondrodisplasias/patologia , Splicing de RNA , RNA Nuclear Pequeno/química , RNA Nuclear Pequeno/genética
17.
Trends Microbiol ; 27(3): 268-281, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30577974

RESUMO

Alteration of host cell splicing is a common feature of many viral infections which is underappreciated because of the complexity and technical difficulty of studying alternative splicing (AS) regulation. Recent advances in RNA sequencing technologies revealed that up to several hundreds of host genes can show altered mRNA splicing upon viral infection. The observed changes in AS events can be either a direct consequence of viral manipulation of the host splicing machinery or result indirectly from the virus-induced innate immune response or cellular damage. Analysis at a higher resolution with single-cell RNAseq, and at a higher scale with the integration of multiple omics data sets in a systems biology perspective, will be needed to further comprehend this complex facet of virus-host interactions.


Assuntos
Processamento Alternativo/genética , Interações entre Hospedeiro e Microrganismos/genética , Imunidade Inata , Vírus/genética , Interações entre Hospedeiro e Microrganismos/imunologia , Humanos , Vírus/imunologia , Vírus/patogenicidade
18.
Sci Rep ; 9(1): 14908, 2019 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-31624302

RESUMO

Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T's. This bias is marked for runs of at least 15 T's, but is already detectable for runs of at least 9 T's and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos/métodos , RNA-Seq/métodos , Análise de Sequência de DNA/métodos , Transcriptoma/genética , Animais , Encéfalo , DNA Complementar/genética , DNA Complementar/isolamento & purificação , Conjuntos de Dados como Assunto , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Fígado , Camundongos , Sequenciamento por Nanoporos/instrumentação , RNA/genética , RNA/isolamento & purificação , RNA-Seq/instrumentação , Análise de Sequência de DNA/instrumentação
19.
Sci Rep ; 8(1): 4307, 2018 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-29523794

RESUMO

Genome-wide analyses estimate that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most methods start by mapping the reads to an annotated reference genome, but some start by a de novo assembly of the reads. In this paper, we present a systematic comparison of a mapping-first approach (FARLINE) and an assembly-first approach (KISSPLICE). We applied these methods to two independent RNAseq datasets and found that the predictions of the two pipelines overlapped (70% of exon skipping events were common), but with noticeable differences. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in recently duplicated genes. The mapping-first approach allowed to find more lowly expressed splicing variants, and splice variants overlapping repeats. This work demonstrates that annotating AS with a single approach leads to missing out a large number of candidates, many of which are differentially regulated across conditions and can be validated experimentally. We therefore advocate for the combined use of both mapping-first and assembly-first approaches for the annotation and differential analysis of AS from RNAseq datasets.


Assuntos
Processamento Alternativo , Análise de Sequência de RNA/métodos , Software , Humanos , Sítios de Splice de RNA , Análise de Sequência de RNA/normas
20.
Algorithms Mol Biol ; 12: 2, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28250805

RESUMO

BACKGROUND: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. RESULTS: The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA