Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 622(7981): 41-47, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37794265

RESUMEN

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Asunto(s)
Genes , Genoma Humano , Anotación de Secuencia Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotación de Secuencia Molecular/normas , Anotación de Secuencia Molecular/tendencias , Isoformas de Proteínas/genética , Proyecto Genoma Humano , Seudogenes , ARN/genética
2.
Nature ; 604(7905): 310-315, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35388217

RESUMEN

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genómica , Genoma , Humanos , Difusión de la Información , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados Unidos
3.
Nature ; 583(7818): 693-698, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728248

RESUMEN

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.


Asunto(s)
Bases de Datos Genéticas , Genoma/genética , Genómica , Anotación de Secuencia Molecular , Animales , Sitios de Unión , Cromatina/genética , Cromatina/metabolismo , Metilación de ADN , Bases de Datos Genéticas/normas , Bases de Datos Genéticas/tendencias , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Genómica/normas , Genómica/tendencias , Histonas/metabolismo , Humanos , Ratones , Anotación de Secuencia Molecular/normas , Control de Calidad , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo
4.
Hum Mol Genet ; 32(10): 1753-1763, 2023 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-36715146

RESUMEN

Pathogenic variations in the sodium voltage-gated channel alpha subunit 1 (SCN1A) gene are responsible for multiple epilepsy phenotypes, including Dravet syndrome, febrile seizures (FS) and genetic epilepsy with FS plus. Phenotypic heterogeneity is a hallmark of SCN1A-related epilepsies, the causes of which are yet to be clarified. Genetic variation in the non-coding regulatory regions of SCN1A could be one potential causal factor. However, a comprehensive understanding of the SCN1A regulatory landscape is currently lacking. Here, we summarized the current state of knowledge of SCN1A regulation, providing details on its promoter and enhancer regions. We then integrated currently available data on SCN1A promoters by extracting information related to the SCN1A locus from genome-wide repositories and clearly defined the promoter and enhancer regions of SCN1A. Further, we explored the cellular specificity of differential SCN1A promoter usage. We also reviewed and integrated the available human brain-derived enhancer databases and mouse-derived data to provide a comprehensive computationally developed summary of SCN1A brain-active enhancers. By querying genome-wide data repositories, extracting SCN1A-specific data and integrating the different types of independent evidence, we created a comprehensive catalogue that better defines the regulatory landscape of SCN1A, which could be used to explore the role of SCN1A regulatory regions in disease.


Asunto(s)
Epilepsias Mioclónicas , Epilepsia , Convulsiones Febriles , Humanos , Ratones , Animales , Canal de Sodio Activado por Voltaje NAV1.1/genética , Epilepsias Mioclónicas/genética , Epilepsia/genética , Regiones Promotoras Genéticas , Fenotipo , Convulsiones Febriles/genética , Mutación
5.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36420896

RESUMEN

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Asunto(s)
Biología Computacional , Genoma Humano , Humanos , Animales , Ratones , Anotación de Secuencia Molecular , Biología Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilación de la Expresión Génica , Bases de Datos Genéticas
6.
Nat Rev Genet ; 19(9): 535-548, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29795125

RESUMEN

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.


Asunto(s)
Mapeo Cromosómico , Perfilación de la Expresión Génica , Genoma Humano , ARN Largo no Codificante , Transcriptoma/fisiología , Estudio de Asociación del Genoma Completo , Humanos , ARN Largo no Codificante/biosíntesis , ARN Largo no Codificante/genética
7.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34634797

RESUMEN

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Asunto(s)
COVID-19/virología , Bases de Datos Genéticas , SARS-CoV-2/genética , Navegador Web , Coronaviridae/genética , Variación Genética , Genoma Viral , Humanos , Anotación de Secuencia Molecular
8.
Annu Rev Genomics Hum Genet ; 21: 55-79, 2020 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-32421357

RESUMEN

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.


Asunto(s)
Genoma Humano , Anotación de Secuencia Molecular/métodos , Anotación de Secuencia Molecular/normas , Humanos
10.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33270111

RESUMEN

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Asunto(s)
COVID-19/prevención & control , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular/métodos , SARS-CoV-2/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Epidemias , Humanos , Internet , Ratones , Seudogenes/genética , ARN Largo no Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Transcripción Genética/genética
11.
Genome Res ; 29(12): 2073-2087, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31537640

RESUMEN

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.


Asunto(s)
Exones , Genoma Humano , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Sistemas de Lectura Abierta , Análisis de Secuencia de ADN , Animales , Humanos , Seudogenes
12.
Neuropathol Appl Neurobiol ; 48(3): e12775, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-34820881

RESUMEN

Non-coding DNA (ncDNA) refers to the portion of the genome that does not code for proteins and accounts for the greatest physical proportion of the human genome. ncDNA includes sequences that are transcribed into RNA molecules, such as ribosomal RNAs (rRNAs), microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and un-transcribed sequences that have regulatory functions, including gene promoters and enhancers. Variation in non-coding regions of the genome have an established role in human disease, with growing evidence from many areas, including several cancers, Parkinson's disease and autism. Here, we review the features and functions of the regulatory elements that are present in the non-coding genome and the role that these regions have in human disease. We then review the existing research in epilepsy and emphasise the potential value of further exploring non-coding regulatory elements in epilepsy. In addition, we outline the most widely used techniques for recognising regulatory elements throughout the genome, current methodologies for investigating variation and the main challenges associated with research in the field of non-coding DNA.


Asunto(s)
Epilepsia , MicroARNs , ARN Largo no Codificante , Epilepsia/genética , Genoma , Humanos , MicroARNs/genética , ARN Largo no Codificante/genética
13.
Acta Neuropathol ; 144(1): 107-127, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35551471

RESUMEN

Mesial temporal lobe epilepsy with hippocampal sclerosis and a history of febrile seizures is associated with common variation at rs7587026, located in the promoter region of SCN1A. We sought to explore possible underlying mechanisms. SCN1A expression was analysed in hippocampal biopsy specimens of individuals with mesial temporal lobe epilepsy with hippocampal sclerosis who underwent surgical treatment, and hippocampal neuronal cell loss was quantitatively assessed using immunohistochemistry. In healthy individuals, hippocampal volume was measured using MRI. Analyses were performed stratified by rs7587026 type. To study the functional consequences of increased SCN1A expression, we generated, using transposon-mediated bacterial artificial chromosome transgenesis, a zebrafish line expressing exogenous scn1a, and performed EEG analysis on larval optic tecta at 4 day post-fertilization. Finally, we used an in vitro promoter analysis to study whether the genetic motif containing rs7587026 influences promoter activity. Hippocampal SCN1A expression differed by rs7587026 genotype (Kruskal-Wallis test P = 0.004). Individuals homozygous for the minor allele showed significantly increased expression compared to those homozygous for the major allele (Dunn's test P = 0.003), and to heterozygotes (Dunn's test P = 0.035). No statistically significant differences in hippocampal neuronal cell loss were observed between the three genotypes. Among 597 healthy participants, individuals homozygous for the minor allele at rs7587026 displayed significantly reduced mean hippocampal volume compared to major allele homozygotes (Cohen's D = - 0.28, P = 0.02), and to heterozygotes (Cohen's D = - 0.36, P = 0.009). Compared to wild type, scn1lab-overexpressing zebrafish larvae exhibited more frequent spontaneous seizures [one-way ANOVA F(4,54) = 6.95 (P < 0.001)]. The number of EEG discharges correlated with the level of scn1lab overexpression [one-way ANOVA F(4,15) = 10.75 (P < 0.001]. Finally, we showed that a 50 bp promoter motif containing rs7587026 exerts a strong regulatory role on SCN1A expression, though we could not directly link this to rs7587026 itself. Our results develop the mechanistic link between rs7587026 and mesial temporal lobe epilepsy with hippocampal sclerosis and a history of febrile seizures. Furthermore, we propose that quantitative precision may be important when increasing SCN1A expression in current strategies aiming to treat seizures in conditions involving SCN1A haploinsufficiency, such as Dravet syndrome.


Asunto(s)
Epilepsia del Lóbulo Temporal , Epilepsia , Canal de Sodio Activado por Voltaje NAV1.1/metabolismo , Convulsiones Febriles , Proteínas de Pez Cebra/metabolismo , Animales , Epilepsia/genética , Epilepsia del Lóbulo Temporal/genética , Genómica , Gliosis/patología , Hipocampo/patología , Humanos , Canal de Sodio Activado por Voltaje NAV1.1/genética , Esclerosis/patología , Convulsiones Febriles/complicaciones , Convulsiones Febriles/genética , Pez Cebra
14.
Haematologica ; 106(10): 2613-2623, 2021 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32703790

RESUMEN

Transcriptional profiling of hematopoietic cell subpopulations has helped to characterize the developmental stages of the hematopoietic system and the molecular bases of malignant and non-malignant blood diseases. Previously, only the genes targeted by expression microarrays could be profiled genome-wide. High-throughput RNA sequencing, however, encompasses a broader repertoire of RNA molecules, without restriction to previously annotated genes. We analyzed the BLUEPRINT consortium RNA-sequencing data for mature hematopoietic cell types. The data comprised 90 total RNA-sequencing samples, each composed of one of 27 cell types, and 32 small RNA-sequencing samples, each composed of one of 11 cell types. We estimated gene and isoform expression levels for each cell type using existing annotations from Ensembl. We then used guided transcriptome assembly to discover unannotated transcripts. We identified hundreds of novel non-coding RNA genes and showed that the majority have cell type-dependent expression. We also characterized the expression of circular RNA and found that these are also cell type-specific. These analyses refine the active transcriptional landscape of mature hematopoietic cells, highlight abundant genes and transcriptional isoforms for each blood cell type, and provide a valuable resource for researchers of hematologic development and diseases. Finally, we made the data accessible via a web-based interface: https://blueprint.haem.cam.ac.uk/bloodatlas/.


Asunto(s)
ARN Largo no Codificante , Transcriptoma , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , ARN Circular , ARN Largo no Codificante/genética , Análisis de Secuencia de ARN
15.
Nucleic Acids Res ; 47(10): 5293-5306, 2019 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-30916337

RESUMEN

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.


Asunto(s)
Codón sin Sentido , Degradación de ARNm Mediada por Codón sin Sentido , Empalme del ARN , ARN Interferente Pequeño/metabolismo , Transcriptoma , Regiones no Traducidas 3' , Empalme Alternativo , Biología Computacional , Exones , Mutación del Sistema de Lectura , Ribonucleoproteína Heterogénea-Nuclear Grupo M/metabolismo , Humanos , Proteínas Nucleares/metabolismo , Precursores del ARN/metabolismo , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/metabolismo , Factores de Empalme Serina-Arginina/metabolismo , Empalmosomas , Factor de Empalme U2AF/metabolismo
16.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357393

RESUMEN

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano/genética , Genómica , Seudogenes/genética , Animales , Biología Computacional , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Programas Informáticos
17.
BMC Genomics ; 21(1): 196, 2020 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-32126975

RESUMEN

BACKGROUND: Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. RESULTS: Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. CONCLUSIONS: This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.


Asunto(s)
Secuencia Conservada , Exones/genética , Sitios de Carácter Cuantitativo , Receptores Odorantes/genética , Animales , Curaduría de Datos/métodos , Bases de Datos Genéticas , Sitios Genéticos , Genoma Humano , Humanos , Ratones , Seudogenes
18.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164755

RESUMEN

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Animales , Caenorhabditis elegans/embriología , Caenorhabditis elegans/crecimiento & desarrollo , Cromatina/genética , Análisis por Conglomerados , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crecimiento & desarrollo , Modelos Genéticos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Pupa/genética , Pupa/crecimiento & desarrollo , ARN no Traducido/genética , Análisis de Secuencia de ARN
19.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29126148

RESUMEN

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Asunto(s)
Secuencia de Consenso , Bases de Datos Genéticas , Sistemas de Lectura Abierta , Animales , Curaduría de Datos/métodos , Curaduría de Datos/normas , Bases de Datos Genéticas/normas , Guías como Asunto , Humanos , Ratones , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interfaz Usuario-Computador
20.
Nucleic Acids Res ; 46(D1): D754-D761, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29155950

RESUMEN

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.


Asunto(s)
Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genoma , Difusión de la Información , Animales , Epigenómica , Genoma Humano , Estudio de Asociación del Genoma Completo , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular , Vertebrados/genética , Navegador Web
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA