Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 112
Filtrar
Mais filtros

País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 178(6): 1465-1477.e17, 2019 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-31491388

RESUMO

Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.


Assuntos
Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias/genética , Regiões Promotoras Genéticas/genética , Transcriptoma/genética , Bases de Dados Genéticas , Humanos , RNA-Seq/métodos
2.
Nature ; 578(7793): 102-111, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32025015

RESUMO

The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available.


Assuntos
Genoma Humano/genética , Mutação/genética , Neoplasias/genética , Quebras de DNA , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Mutação INDEL
3.
Nature ; 578(7793): 129-136, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32025019

RESUMO

Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.


Assuntos
Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , RNA/genética , Variações do Número de Cópias de DNA , DNA de Neoplasias , Genoma Humano , Genômica , Humanos , Transcriptoma
4.
J Exp Bot ; 75(1): 274-299, 2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-37804484

RESUMO

Catharanthus roseus leaves produce a range of monoterpenoid indole alkaloids (MIAs) that include low levels of the anticancer drugs vinblastine and vincristine. The MIA pathway displays a complex architecture spanning different subcellular and cell type localizations, and is under complex regulation. As a result, the development of strategies to increase the levels of the anticancer MIAs has remained elusive. The pathway involves mesophyll specialized idioblasts where the late unsolved biosynthetic steps are thought to occur. Here, protoplasts of C. roseus leaf idioblasts were isolated by fluorescence-activated cell sorting, and their differential alkaloid and transcriptomic profiles were characterized. This involved the assembly of an improved C. roseus transcriptome from short- and long-read data, IDIO+. It was observed that C. roseus mesophyll idioblasts possess a distinctive transcriptomic profile associated with protection against biotic and abiotic stresses, and indicative that this cell type is a carbon sink, in contrast to surrounding mesophyll cells. Moreover, it is shown that idioblasts are a hotspot of alkaloid accumulation, suggesting that their transcriptome may hold the key to the in-depth understanding of the MIA pathway and the success of strategies leading to higher levels of the anticancer drugs.


Assuntos
Antineoplásicos , Catharanthus , Plantas Medicinais , Alcaloides de Triptamina e Secologanina , Plantas Medicinais/metabolismo , Catharanthus/genética , Catharanthus/metabolismo , Antineoplásicos/metabolismo , Alcaloides de Triptamina e Secologanina/metabolismo , Folhas de Planta/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Regulação da Expressão Gênica de Plantas
7.
Nature ; 551(7678): 51-56, 2017 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-29094699

RESUMO

Imaging and chromosome conformation capture studies have revealed several layers of chromosome organization, including segregation into megabase-sized active and inactive compartments, and partitioning into sub-megabase domains (TADs). It remains unclear, however, how these layers of organization form, interact with one another and influence genome function. Here we show that deletion of the cohesin-loading factor Nipbl in mouse liver leads to a marked reorganization of chromosomal folding. TADs and associated Hi-C peaks vanish globally, even in the absence of transcriptional changes. By contrast, compartmental segregation is preserved and even reinforced. Strikingly, the disappearance of TADs unmasks a finer compartment structure that accurately reflects the underlying epigenetic landscape. These observations demonstrate that the three-dimensional organization of the genome results from the interplay of two independent mechanisms: cohesin-independent segregation of the genome into fine-scale compartments, defined by chromatin state; and cohesin-dependent formation of TADs, possibly by loop extrusion, which helps to guide distant enhancers to their target genes.


Assuntos
Proteínas de Ciclo Celular/metabolismo , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/metabolismo , Posicionamento Cromossômico , Animais , Cromatina/química , Cromatina/genética , Elementos Facilitadores Genéticos/genética , Epigênese Genética , Fígado/metabolismo , Camundongos , Fatores de Transcrição/deficiência , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica , Coesinas
8.
Nucleic Acids Res ; 48(D1): D77-D83, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31665515

RESUMO

Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Especificidade de Órgãos , Análise de Célula Única/métodos , Interface Usuário-Computador
9.
Nucleic Acids Res ; 47(D1): D711-D715, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357387

RESUMO

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in experiments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Célula Única/métodos , Software , Bases de Dados Genéticas , RNA-Seq/métodos
10.
Adv Exp Med Biol ; 1295: 271-299, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33543464

RESUMO

Multiple studies about tumor biology have revealed the determinant role of the tumor microenvironment in cancer progression, resulting from the dynamic interactions between tumor cells and surrounding stromal cells within the extracellular matrix. This malignant microenvironment highly impacts the efficacy of anticancer nanoparticles by displaying drug resistance mechanisms, as well as intrinsic physical and biochemical barriers, which hamper their intratumoral accumulation and biological activity.Currently, two-dimensional cell cultures are used as the initial screening method in vitro for testing cytotoxic nanocarriers. However, this fails to mimic the tumor heterogeneity, as well as the three-dimensional tumor architecture and pathophysiological barriers, leading to an inaccurate pharmacological evaluation.Biomimetic 3D in vitro tumor models, on the other hand, are emerging as promising tools for more accurately assessing nanoparticle activity, owing to their ability to recapitulate certain features of the tumor microenvironment and thus provide mechanistic insights into nanocarrier intratumoral penetration and diffusion rates.Notwithstanding, in vivo validation of nanomedicines remains irreplaceable at the preclinical stage, and a vast variety of more advanced in vivo tumor models is currently available. Such complex animal models (e.g., genetically engineered mice and patient-derived xenografts) are capable of better predicting nanocarrier clinical efficiency, as they closely resemble the heterogeneity of the human tumor microenvironment.Herein, the development of physiologically more relevant in vitro and in vivo tumor models for the preclinical evaluation of anticancer nanoparticles will be discussed, as well as the current limitations and future challenges in clinical translation.


Assuntos
Antineoplásicos , Nanopartículas , Animais , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Nanomedicina , Esferoides Celulares , Microambiente Tumoral
11.
Nucleic Acids Res ; 46(D1): D246-D251, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29165655

RESUMO

Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.


Assuntos
Bases de Dados Genéticas , Animais , Perfilação da Expressão Gênica , Humanos , Mamíferos/genética , Mamíferos/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Plantas/genética , Plantas/metabolismo , Proteômica , Análise de Sequência de RNA , Especificidade da Espécie , Interface Usuário-Computador
12.
Nucleic Acids Res ; 46(D1): D1181-D1189, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29165610

RESUMO

Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genômica/métodos , Bases de Conhecimento , Plantas/genética , Epigênese Genética , Ontologia Genética , Pesquisa em Genética , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/metabolismo , Software , Interface Usuário-Computador
13.
N Engl J Med ; 385(15): e51, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34614342
14.
Bioinformatics ; 33(14): 2218-2220, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28369191

RESUMO

MOTIVATION: The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. RESULTS: The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. AVAILABILITY AND IMPLEMENTATION: The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . CONTACT: rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Eucariotos/genética , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Bases de Dados Genéticas , Expressão Gênica , Ontologia Genética , Humanos , Internet
15.
Nucleic Acids Res ; 44(D1): D746-52, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26481351

RESUMO

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Plantas/metabolismo , Proteínas/metabolismo , Proteômica , Animais , Linhagem Celular Tumoral , Humanos , Plantas/genética , Interface Usuário-Computador
16.
Nucleic Acids Res ; 44(D1): D1133-40, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26553803

RESUMO

Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.


Assuntos
Bases de Dados Genéticas , Genoma de Planta , Plantas/metabolismo , Expressão Gênica , Variação Genética , Genômica , Internet , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Plantas/genética
17.
Genome Res ; 24(11): 1797-807, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25122613

RESUMO

The genetic code is an abstraction of how mRNA codons and tRNA anticodons molecularly interact during protein synthesis; the stability and regulation of this interaction remains largely unexplored. Here, we characterized the expression of mRNA and tRNA genes quantitatively at multiple time points in two developing mouse tissues. We discovered that mRNA codon pools are highly stable over development and simply reflect the genomic background; in contrast, precise regulation of tRNA gene families is required to create the corresponding tRNA transcriptomes. The dynamic regulation of tRNA genes during development is controlled in order to generate an anticodon pool that closely corresponds to messenger RNAs. Thus, across development, the pools of mRNA codons and tRNA anticodons are invariant and highly correlated, revealing a stable molecular interaction interlocking transcription and translation.


Assuntos
Encéfalo/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Fígado/metabolismo , RNA Mensageiro/genética , RNA de Transferência/genética , Transcriptoma , Animais , Anticódon/genética , Sequência de Bases , Encéfalo/embriologia , Imunoprecipitação da Cromatina/métodos , Códon/genética , Simulação por Computador , Embrião de Mamíferos/embriologia , Embrião de Mamíferos/metabolismo , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Fígado/embriologia , Masculino , Camundongos Endogâmicos C57BL , Modelos Genéticos , Fases de Leitura Aberta/genética , Análise de Componente Principal , RNA Mensageiro/metabolismo , RNA de Transferência/metabolismo , Fatores de Tempo
18.
Nucleic Acids Res ; 42(Database issue): D926-32, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24304889

RESUMO

Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genômica , Humanos , Internet , Análise de Sequência com Séries de Oligonucleotídeos , Proteínas/genética , Proteínas/metabolismo , Isoformas de RNA/metabolismo , Análise de Sequência de RNA
19.
BMC Genomics ; 16 Suppl 8: S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26110515

RESUMO

BACKGROUND: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. RESULTS: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. CONCLUSIONS: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.


Assuntos
Biologia Computacional , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas/metabolismo , Software , Processamento Alternativo , Bases de Dados Genéticas , Humanos , Isoformas de Proteínas/genética , Transcriptoma
20.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21926179

RESUMO

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Assuntos
Genoma/fisiologia , Genômica/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA