Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Nucleic Acids Res ; 46(D1): D1181-D1189, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29165610

RESUMEN

Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.


Asunto(s)
Bases de Datos Genéticas , Regulación de la Expresión Génica de las Plantas , Genómica/métodos , Bases del Conocimiento , Plantas/genética , Epigénesis Genética , Ontología de Genes , Investigación Genética , Variación Genética , Genoma de Planta , Redes y Vías Metabólicas/genética , Anotación de Secuencia Molecular , Plantas/metabolismo , Programas Informáticos , Interfaz Usuario-Computador
2.
Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29092050

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.


Asunto(s)
Archaea/genética , Bacterias/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Eucariontes/genética , Genómica , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Minería de Datos , Predicción , Genoma , Anotación de Secuencia Molecular , ARN/genética , Interfaz Usuario-Computador
3.
Nucleic Acids Res ; 44(D1): D574-80, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26578574

RESUMEN

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.


Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animales , Diploidia , Eucariontes/genética , Variación Genética , Genoma , Poliploidía , Alineación de Secuencia
4.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955987

RESUMEN

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano , Genómica/métodos , Anotación de Secuencia Molecular , Animales , Biología Computacional/métodos , ADN Complementario/química , ADN Complementario/genética , Evolución Molecular , Exones , Sitios Genéticos , Humanos , Internet , Modelos Moleculares , Sistemas de Lectura Abierta , Seudogenes , Control de Calidad , Sitios de Empalme de ARN , ARN Largo no Codificante , Reproducibilidad de los Resultados , Regiones no Traducidas
5.
NPJ Genom Med ; 4: 31, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31814998

RESUMEN

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

6.
Nat Commun ; 7: 12339, 2016 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-27531712

RESUMEN

Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reacción en Cadena de la Polimerasa/métodos , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN/métodos , Exones/genética , Sitios Genéticos , Humanos , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Prueba de Estudio Conceptual , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sitios de Empalme de ARN/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transcriptoma/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA