Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 24(2): 212-26, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24265505

RESUMO

Chronic lymphocytic leukemia (CLL) has heterogeneous clinical and biological behavior. Whole-genome and -exome sequencing has contributed to the characterization of the mutational spectrum of the disease, but the underlying transcriptional profile is still poorly understood. We have performed deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients, and characterized the CLL transcriptional landscape with unprecedented resolution. We detected thousands of transcriptional elements differentially expressed between the CLL and normal B cells, including protein-coding genes, noncoding RNAs, and pseudogenes. Transposable elements are globally derepressed in CLL cells. In addition, two thousand genes-most of which are not differentially expressed-exhibit CLL-specific splicing patterns. Genes involved in metabolic pathways showed higher expression in CLL, while genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL. Clustering of the CLL samples according to RNA-seq derived gene expression levels unveiled two robust molecular subgroups, C1 and C2. C1/C2 subgroups and the mutational status of the immunoglobulin heavy variable (IGHV) region were the only independent variables in predicting time to treatment in a multivariate analysis with main clinico-biological features. This subdivision was validated in an independent cohort of patients monitored through DNA microarrays. Further analysis shows that B-cell receptor (BCR) activation in the microenvironment of the lymph node may be at the origin of the C1/C2 differences.


Assuntos
Linfócitos B , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Leucemia Linfocítica Crônica de Células B/genética , Idoso , Sequência de Bases , Feminino , Perfilação da Expressão Gênica , Humanos , Região Variável de Imunoglobulina , Leucemia Linfocítica Crônica de Células B/patologia , Masculino , Pessoa de Meia-Idade , Mutação , Ribossomos/genética , Spliceossomos/genética
2.
Genome Res ; 22(9): 1616-25, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955974

RESUMO

Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.


Assuntos
Genoma Humano , Splicing de RNA , RNA Longo não Codificante/metabolismo , Transcrição Gênica , Cromatina/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Spliceossomos/genética , Spliceossomos/metabolismo , Frações Subcelulares/química
3.
Genome Res ; 22(9): 1775-89, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955988

RESUMO

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.


Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/genética , Processamento Alternativo , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Análise por Conglomerados , Evolução Molecular , Éxons , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Primatas/genética , Processamento Pós-Transcricional do RNA , Sítios de Splice de RNA , RNA Mensageiro/genética , Seleção Genética , Transcrição Gênica
4.
Bioinformatics ; 29(2): 273-4, 2013 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-23172860

RESUMO

MOTIVATION: Novel technologies brought in unprecedented amounts of high-throughput sequencing data along with great challenges in their analysis and interpretation. The percent-spliced-in (PSI, ) metric estimates the incidence of single-exon-skipping events and can be computed directly by counting reads that align to known or predicted splice junctions. However, the majority of human splicing events are more complex than single-exon skipping. RESULTS: In this short report, we present a framework that generalizes the metric to arbitrary classes of splicing events. We change the view from exon centric to intron centric and split the value of into two indices, and , measuring the rate of splicing at the 5' and 3' end of the intron, respectively. The advantage of having two separate indices is that they deconvolute two distinct elementary acts of the splicing reaction. The completeness of splicing index is decomposed in a similar way. This framework is implemented as bam2ssj, a BAM-file-processing pipeline for strand-specific counting of reads that align to splice junctions or overlap with splice sites. It can be used as a consistent protocol for quantifying splice junctions from RNA-seq data because no such standard procedure currently exists. AVAILABILITY: The C code of bam2ssj is open source and is available at https://github.com/pervouchine/bam2ssj CONTACT: dp@crg.eu


Assuntos
Processamento Alternativo , Íntrons , Análise de Sequência de RNA , Software , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sítios de Splice de RNA
5.
Bioinformatics ; 29(5): 614-21, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23329413

RESUMO

MOTIVATION: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. RESULTS: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. AVAILABILITY: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Software , Mapeamento Cromossômico , Biologia Computacional , Éxons , Genoma , Sequenciamento de Nucleotídeos em Larga Escala
6.
Genome Res ; 19(10): 1752-9, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19726446

RESUMO

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.


Assuntos
Evolução Molecular , Proteínas/genética , Animais , Sequência de Bases , DNA Intergênico/análise , DNA Intergênico/genética , Bases de Dados Genéticas , Genes/fisiologia , Genoma Humano , Humanos , Modelos Biológicos , Dados de Sequência Molecular , Mutação/fisiologia , Pan troglodytes/genética , Filogenia , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico
7.
PLoS One ; 7(1): e30377, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22276185

RESUMO

We present a fast mapping-based algorithm to compute the mappability of each region of a reference genome up to a specified number of mismatches. Knowing the mappability of a genome is crucial for the interpretation of massively parallel sequencing experiments. We investigate the properties of the mappability of eukaryotic DNA/RNA both as a whole and at the level of the gene family, providing for various organisms tracks which allow the mappability information to be visually explored. In addition, we show that mappability varies greatly between species and gene classes. Finally, we suggest several practical applications where mappability can be used to refine the analysis of high-throughput sequencing data (SNP calling, gene expression quantification and paired-end experiments). This work highlights mappability as an important concept which deserves to be taken into full account, in particular when massively parallel sequencing technologies are employed. The GEM mappability program belongs to the GEM (GEnome Multitool) suite of programs, which can be freely downloaded for any use from its website (http://gemlibrary.sourceforge.net).


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Humano/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
8.
Mol Biol Evol ; 23(8): 1548-57, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16720694

RESUMO

We examined the gene structure of a set of 2563 Arabidopsis thaliana paralogous pairs that were duplicated simultaneously 20-60 MYA by tetraploidy. Out of a total of 23,164 introns in these genes, we found that 10,004 pairs have been conserved and 578 introns have been inserted or deleted in the time since the duplication event. This intron insertion/deletion rate of 2.7 x 10(-3) to 9.1 x 10(-4) per site per million years is high in comparison to previous studies. At least 56 introns were gained and 39 lost based on parsimony analysis of the phylogenetic distribution of these introns. We found weak evidence that genes undergoing intron gain and loss are biased with respect to gene ontology terms. Gene pairs that experienced at least 2 intron insertions or deletions show evidence of enrichment for membrane location and transport and transporter activity function. We do not find any relationship of intron flux to expression level or G + C content of the gene. Detection of a bias in the location of intron gains and losses within a gene depends on the method of measurement: an intragene method indicates that events (specifically intron losses) are biased toward the 3' end of the gene. Despite the relatively recent acquisition of these introns, we found only one case where we could identify the mechanism of intron origin--the TOUCH3 gene has experienced 2 tandem, partial, internal gene duplications that duplicated a preexisting intron and also created a novel, alternatively spliced intron that makes use of a duplicated pair of cryptic splice sites.


Assuntos
Arabidopsis/genética , DNA de Plantas/genética , Evolução Molecular , Duplicação Gênica , Genes de Plantas , Íntrons , Sequência de Bases , Dados de Sequência Molecular , Filogenia , Sítios de Splice de RNA , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa