Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Nucleic Acids Res ; 37(15): e104, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19531739

RESUMEN

Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts.


Asunto(s)
Genómica/métodos , Inmunoprecipitación de Cromatina , Mapeo Cromosómico , Perfilación de la Expresión Génica , Genoma Humano , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
2.
Nucleic Acids Res ; 35(17): e108, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17709346

RESUMEN

Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , ARN Mensajero/análisis , Lugares Marcados de Secuencia , Secuencia de Bases , Biología Computacional , Biblioteca de Genes , Genómica/métodos , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Alineación de Secuencia , Transcripción Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA