Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(2): 712-727, 2023 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-36537210

RESUMEN

Various genetic diseases associated with microcephaly and developmental defects are due to pathogenic variants in the U4atac small nuclear RNA (snRNA), a component of the minor spliceosome essential for the removal of U12-type introns from eukaryotic mRNAs. While it has been shown that a few RNU4ATAC mutations result in impaired binding of essential protein components, the molecular defects of the vast majority of variants are still unknown. Here, we used lymphoblastoid cells derived from RNU4ATAC compound heterozygous (g.108_126del;g.111G>A) twin patients with MOPD1 phenotypes to analyze the molecular consequences of the mutations on small nuclear ribonucleoproteins (snRNPs) formation and on splicing. We found that the U4atac108_126del mutant is unstable and that the U4atac111G>A mutant as well as the minor di- and tri-snRNPs are present at reduced levels. Our results also reveal the existence of 3'-extended snRNA transcripts in patients' cells. Moreover, we show that the mutant cells have alterations in splicing of INTS7 and INTS10 minor introns, contain lower levels of the INTS7 and INTS10 proteins and display changes in the assembly of Integrator subunits. Altogether, our results show that compound heterozygous g.108_126del;g.111G>A mutations induce splicing defects and affect the homeostasis and function of the Integrator complex.


Asunto(s)
Ribonucleoproteínas Nucleares Pequeñas , Empalmosomas , Empalmosomas/genética , Empalmosomas/metabolismo , Ribonucleoproteínas Nucleares Pequeñas/genética , Mutación , Intrones/genética , Empalme del ARN/genética , ARN Nuclear Pequeño/metabolismo , Homeostasis/genética
3.
PLoS One ; 15(7): e0235655, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32628740

RESUMEN

Biallelic variants in RNU4ATAC, a non-coding gene transcribed into the minor spliceosome component U4atac snRNA, are responsible for three rare recessive developmental diseases, namely Taybi-Linder/MOPD1, Roifman and Lowry-Wood syndromes. Next-generation sequencing of clinically heterogeneous cohorts (children with either a suspected genetic disorder or a congenital microcephaly) recently identified mutations in this gene, illustrating how profoundly these technologies are modifying genetic testing and assessment. As RNU4ATAC has a single non-coding exon, the bioinformatic prediction algorithms assessing the effect of sequence variants on splicing or protein function are irrelevant, which makes variant interpretation challenging to molecular diagnostic laboratories. In order to facilitate and improve clinical diagnostic assessment and genetic counseling, we present i) an update of the previously reported RNU4ATAC mutations and an analysis of the genetic variations affecting this gene using the Genome Aggregation Database (gnomAD) resource; ii) the pathogenicity prediction performances of scores computed based on an RNA structure prediction tool and of those produced by the Combined Annotation Dependent Depletion tool for the 285 RNU4ATAC variants identified in patients or in large-scale sequencing projects; iii) a method, based on a cellular assay, that allows to measure the effect of RNU4ATAC variants on splicing efficiency of a minor (U12-type) reporter intron. Lastly, the concordance of bioinformatic predictions and cellular assay results was investigated.


Asunto(s)
ARN Nuclear Pequeño/metabolismo , Empalmosomas/metabolismo , Niño , Bases de Datos Genéticas , Enanismo/genética , Enanismo/patología , Retardo del Crecimiento Fetal/genética , Retardo del Crecimiento Fetal/patología , Fibroblastos/citología , Fibroblastos/metabolismo , Variación Genética , Humanos , Microcefalia/genética , Microcefalia/patología , Conformación de Ácido Nucleico , Osteocondrodisplasias/genética , Osteocondrodisplasias/patología , Empalme del ARN , ARN Nuclear Pequeño/química , ARN Nuclear Pequeño/genética
4.
NAR Genom Bioinform ; 2(4): lqaa095, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33575639

RESUMEN

Influenza A viruses (IAVs) use diverse mechanisms to interfere with cellular gene expression. Although many RNA-seq studies have documented IAV-induced changes in host mRNA abundance, few were designed to allow an accurate quantification of changes in host mRNA splicing. Here, we show that IAV infection of human lung cells induces widespread alterations of cellular splicing, with an overall increase in exon inclusion and decrease in intron retention. Over half of the mRNAs that show differential splicing undergo no significant changes in abundance or in their 3' end termination site, suggesting that IAVs can specifically manipulate cellular splicing. Among a randomly selected subset of 21 IAV-sensitive alternative splicing events, most are specific to IAV infection as they are not observed upon infection with VSV, induction of interferon expression or induction of an osmotic stress. Finally, the analysis of splicing changes in RED-depleted cells reveals a limited but significant overlap with the splicing changes in IAV-infected cells. This observation suggests that hijacking of RED by IAVs to promote splicing of the abundant viral NS1 mRNAs could partially divert RED from its target mRNAs. All our RNA-seq datasets and analyses are made accessible for browsing through a user-friendly Shiny interface (http://virhostnet.prabi.fr:3838/shinyapps/flu-splicing or https://github.com/cbenoitp/flu-splicing).

5.
Sci Rep ; 9(1): 14908, 2019 10 17.
Artículo en Inglés | MEDLINE | ID: mdl-31624302

RESUMEN

Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T's. This bias is marked for runs of at least 15 T's, but is already detectable for runs of at least 9 T's and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nanoporos/métodos , RNA-Seq/métodos , Análisis de Secuencia de ADN/métodos , Transcriptoma/genética , Animales , Encéfalo , ADN Complementario/genética , ADN Complementario/aislamiento & purificación , Conjuntos de Datos como Asunto , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Hígado , Ratones , Secuenciación de Nanoporos/instrumentación , ARN/genética , ARN/aislamiento & purificación , RNA-Seq/instrumentación , Análisis de Secuencia de ADN/instrumentación
6.
RNA ; 25(9): 1130-1149, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31175170

RESUMEN

Minor intron splicing plays a central role in human embryonic development and survival. Indeed, biallelic mutations in RNU4ATAC, transcribed into the minor spliceosomal U4atac snRNA, are responsible for three rare autosomal recessive multimalformation disorders named Taybi-Linder (TALS/MOPD1), Roifman (RFMN), and Lowry-Wood (LWS) syndromes, which associate numerous overlapping signs of varying severity. Although RNA-seq experiments have been conducted on a few RFMN patient cells, none have been performed in TALS, and more generally no in-depth transcriptomic analysis of the ∼700 human genes containing a minor (U12-type) intron had been published as yet. We thus sequenced RNA from cells derived from five skin, three amniotic fluid, and one blood biosamples obtained from seven unrelated TALS cases and from age- and sex-matched controls. This allowed us to describe for the first time the mRNA expression and splicing profile of genes containing U12-type introns, in the context of a functional minor spliceosome. Concerning RNU4ATAC-mutated patients, we show that as expected, they display distinct U12-type intron splicing profiles compared to controls, but that rather unexpectedly mRNA expression levels are mostly unchanged. Furthermore, although U12-type intron missplicing concerns most of the expressed U12 genes, the level of U12-type intron retention is surprisingly low in fibroblasts and amniocytes, and much more pronounced in blood cells. Interestingly, we found several occurrences of introns that can be spliced using either U2, U12, or a combination of both types of splice site consensus sequences, with a shift towards splicing using preferentially U2 sites in TALS patients' cells compared to controls.


Asunto(s)
Enanismo/genética , Retardo del Crecimiento Fetal/genética , Microcefalia/genética , Osteocondrodisplasias/genética , Empalme del ARN/genética , Transcriptoma/genética , Adulto , Anciano , Secuencia de Bases/genética , Preescolar , Secuencia de Consenso/genética , Femenino , Perfilación de la Expresión Génica/métodos , Humanos , Lactante , Intrones/genética , Masculino , Persona de Mediana Edad , ARN/genética , ARN Mensajero/genética , ARN Nuclear Pequeño/genética , Empalmosomas/genética , Adulto Joven
7.
Trends Microbiol ; 27(3): 268-281, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30577974

RESUMEN

Alteration of host cell splicing is a common feature of many viral infections which is underappreciated because of the complexity and technical difficulty of studying alternative splicing (AS) regulation. Recent advances in RNA sequencing technologies revealed that up to several hundreds of host genes can show altered mRNA splicing upon viral infection. The observed changes in AS events can be either a direct consequence of viral manipulation of the host splicing machinery or result indirectly from the virus-induced innate immune response or cellular damage. Analysis at a higher resolution with single-cell RNAseq, and at a higher scale with the integration of multiple omics data sets in a systems biology perspective, will be needed to further comprehend this complex facet of virus-host interactions.


Asunto(s)
Empalme Alternativo/genética , Interacciones Microbiota-Huesped/genética , Inmunidad Innata , Virus/genética , Interacciones Microbiota-Huesped/inmunología , Humanos , Virus/inmunología , Virus/patogenicidad
8.
PLoS Genet ; 14(11): e1007758, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30419019

RESUMEN

Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient-experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa-along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.


Asunto(s)
Genoma Bacteriano , Estudio de Asociación del Genoma Completo/métodos , Gráficos por Computador , ADN Bacteriano/genética , Bases de Datos Genéticas , Farmacorresistencia Bacteriana/genética , Variación Genética , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Secuencias Repetitivas Esparcidas , Modelos Genéticos , Mycobacterium tuberculosis/efectos de los fármacos , Mycobacterium tuberculosis/genética , Fenotipo , Pseudomonas aeruginosa/efectos de los fármacos , Pseudomonas aeruginosa/genética , Análisis de Secuencia de ADN , Programas Informáticos , Staphylococcus aureus/efectos de los fármacos , Staphylococcus aureus/genética
9.
Sci Rep ; 8(1): 4307, 2018 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-29523794

RESUMEN

Genome-wide analyses estimate that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most methods start by mapping the reads to an annotated reference genome, but some start by a de novo assembly of the reads. In this paper, we present a systematic comparison of a mapping-first approach (FARLINE) and an assembly-first approach (KISSPLICE). We applied these methods to two independent RNAseq datasets and found that the predictions of the two pipelines overlapped (70% of exon skipping events were common), but with noticeable differences. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in recently duplicated genes. The mapping-first approach allowed to find more lowly expressed splicing variants, and splice variants overlapping repeats. This work demonstrates that annotating AS with a single approach leads to missing out a large number of candidates, many of which are differentially regulated across conditions and can be validated experimentally. We therefore advocate for the combined use of both mapping-first and assembly-first approaches for the annotation and differential analysis of AS from RNAseq datasets.


Asunto(s)
Empalme Alternativo , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Humanos , Sitios de Empalme de ARN , Análisis de Secuencia de ARN/normas
10.
Algorithms Mol Biol ; 12: 2, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28250805

RESUMEN

BACKGROUND: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. RESULTS: The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.

11.
Sci Rep ; 7: 40618, 2017 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-28091568

RESUMEN

Crosses between close species can lead to genomic disorders, often considered to be the cause of hybrid incompatibility, one of the initial steps in the speciation process. How these incompatibilities are established and what are their causes remain unclear. To understand the initiation of hybrid incompatibility, we performed reciprocal crosses between two species of Drosophila (D. mojavensis and D. arizonae) that diverged less than 1 Mya. We performed a genome-wide transcriptomic analysis on ovaries from parental lines and on hybrids from reciprocal crosses. Using an innovative procedure of co-assembling transcriptomes, we show that parental lines differ in the expression of their genes and transposable elements. Reciprocal hybrids presented specific gene categories and few transposable element families misexpressed relative to the parental lines. Because TEs are mainly silenced by piwi-interacting RNAs (piRNAs), we hypothesize that in hybrids the deregulation of specific TE families is due to the absence of such small RNAs. Small RNA sequencing confirmed our hypothesis and we therefore propose that TEs can indeed be major players of genome differentiation and be implicated in the first steps of genomic incompatibilities through small RNA regulation.


Asunto(s)
Elementos Transponibles de ADN/genética , Drosophila/genética , Regulación de la Expresión Génica , Hibridación Genética , Animales , Secuencia Conservada/genética , Femenino , Ontología de Genes , Genes de Insecto , Geografía , Patrón de Herencia/genética , Masculino , México , ARN Interferente Pequeño/metabolismo , Especificidad de la Especie , Transcriptoma/genética , Estados Unidos
12.
Nucleic Acids Res ; 44(19): e148, 2016 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-27458203

RESUMEN

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.


Asunto(s)
Secuencia de Bases , Genoma , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ARN , Algoritmos , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Marcadores Genéticos , Genómica/métodos , Genotipo , Humanos , Fenotipo , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Transcriptoma
13.
Nat Commun ; 7: 11067, 2016 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-27063795

RESUMEN

Myotonic dystrophy (DM) is caused by the expression of mutant RNAs containing expanded CUG repeats that sequester muscleblind-like (MBNL) proteins, leading to alternative splicing changes. Cardiac alterations, characterized by conduction delays and arrhythmia, are the second most common cause of death in DM. Using RNA sequencing, here we identify novel splicing alterations in DM heart samples, including a switch from adult exon 6B towards fetal exon 6A in the cardiac sodium channel, SCN5A. We find that MBNL1 regulates alternative splicing of SCN5A mRNA and that the splicing variant of SCN5A produced in DM presents a reduced excitability compared with the control adult isoform. Importantly, reproducing splicing alteration of Scn5a in mice is sufficient to promote heart arrhythmia and cardiac-conduction delay, two predominant features of myotonic dystrophy. In conclusion, misregulation of the alternative splicing of SCN5A may contribute to a subset of the cardiac dysfunctions observed in myotonic dystrophy.


Asunto(s)
Empalme Alternativo/genética , Arritmias Cardíacas/complicaciones , Arritmias Cardíacas/genética , Sistema de Conducción Cardíaco/fisiopatología , Distrofia Miotónica/complicaciones , Distrofia Miotónica/genética , Canal de Sodio Activado por Voltaje NAV1.5/genética , Adulto , Anciano , Animales , Secuencia de Bases , Sitios de Unión , Simulación por Computador , Fenómenos Electrofisiológicos , Exones/genética , Femenino , Células HEK293 , Sistema de Conducción Cardíaco/patología , Humanos , Masculino , Persona de Mediana Edad , Datos de Secuencia Molecular , Canal de Sodio Activado por Voltaje NAV1.5/metabolismo , Motivos de Nucleótidos/genética , Proteínas de Unión al ARN/metabolismo , Canales de Sodio/metabolismo , Xenopus
14.
Gigascience ; 5: 9, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26870323

RESUMEN

BACKGROUND: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. FINDINGS: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. CONCLUSIONS: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Secuencia de Bases , Análisis por Conglomerados , Genoma/genética , Genómica/métodos , Datos de Secuencia Molecular , Reproducibilidad de los Resultados
15.
Algorithms Mol Biol ; 10: 20, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26120359

RESUMEN

BACKGROUND: The problem of enumerating bubbles with length constraints in directed graphs arises in transcriptomics where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. RESULTS: We present a new algorithm for enumerating bubbles with length constraints in weighted directed graphs. This is the first polynomial delay algorithm for this problem and we show that in practice, it is faster than previous approaches. CONCLUSION: This settles one of the main open questions from Sacomoto et al. (BMC Bioinform 13:5, 2012). Moreover, the new algorithm allows us to deal with larger instances and possibly detect longer alternative splicing events.

16.
Nucleic Acids Res ; 43(2): e11, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25404127

RESUMEN

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use a reference genome. As non-model organisms are increasingly investigated, the need for reference-free methods has been amplified. Most of the existing reference-free methods have fundamental limitations: they can only call SNPs between exactly two datasets, and/or they require a prohibitive amount of computational resources. The method we propose, discoSnp, detects both heterozygous and homozygous isolated SNPs from any number of read datasets, without a reference genome, and with very low memory and time footprints (billions of reads can be analyzed with a standard desktop computer). To facilitate downstream genotyping analyses, discoSnp ranks predictions and outputs quality and coverage per allele. Compared to finding isolated SNPs using a state-of-the-art assembly and mapping approach, discoSnp requires significantly less computational resources, shows similar precision/recall values, and highly ranked predictions are less likely to be false positives. An experimental validation was conducted on an arthropod species (the tick Ixodes ricinus) on which de novo sequencing was performed. Among the predicted SNPs that were tested, 96% were successfully genotyped and truly exhibited polymorphism.


Asunto(s)
Técnicas de Genotipaje/métodos , Polimorfismo de Nucleótido Simple , Algoritmos , Animales , Cromosomas Humanos Par 1 , Escherichia coli/genética , Genómica/métodos , Humanos , Ixodes/genética , Ratones , Ratones Endogámicos C57BL , Saccharomyces cerevisiae/genética
17.
Bioinformatics ; 30(1): 61-70, 2014 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-24167155

RESUMEN

MOTIVATION: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. RESULTS: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. AVAILABILITY AND IMPLEMENTATION: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.


Asunto(s)
Cadmio/farmacología , Metabolómica/métodos , Saccharomyces cerevisiae/efectos de los fármacos , Saccharomyces cerevisiae/metabolismo , Activación Enzimática , Glutatión/biosíntesis
18.
BMC Genomics ; 14: 309, 2013 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-23651581

RESUMEN

BACKGROUND: Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. RESULTS: We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. CONCLUSION: In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.


Asunto(s)
Genoma Arqueal/genética , Genoma Bacteriano/genética , Inestabilidad Genómica , Modelos Genéticos , Especificidad de la Especie
19.
Nucleic Acids Res ; 41(Database issue): D142-51, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23143107

RESUMEN

Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, 'Junction Search' screens through the RNA-seq reads found at the chimeras' junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.


Asunto(s)
Bases de Datos Genéticas , Proteínas Mutantes Quiméricas/genética , ARN/química , Animales , Puntos de Rotura del Cromosoma , Gráficos por Computador , Drosophila/genética , Fusión Génica , Humanos , Internet , Ratones , Proteínas Mutantes Quiméricas/metabolismo , Neoplasias/genética , ARN/metabolismo , Análisis de Secuencia de ARN
20.
BMC Genomics ; 13: 438, 2012 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-22938206

RESUMEN

BACKGROUND: A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. RESULTS: We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. CONCLUSION: Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.


Asunto(s)
Bacterias/genética , Bacterias/metabolismo , Evolución Molecular , Genoma Bacteriano/genética , Redes y Vías Metabólicas/genética , Simbiosis/genética , Modelos Genéticos , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA