Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36420896

RESUMEN

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Asunto(s)
Biología Computacional , Genoma Humano , Humanos , Animales , Ratones , Anotación de Secuencia Molecular , Biología Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilación de la Expresión Génica , Bases de Datos Genéticas
2.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33270111

RESUMEN

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Asunto(s)
COVID-19/prevención & control , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular/métodos , SARS-CoV-2/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Epidemias , Humanos , Internet , Ratones , Seudogenes/genética , ARN Largo no Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiología , Transcripción Genética/genética
3.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357393

RESUMEN

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Asunto(s)
Bases de Datos Genéticas , Genoma Humano/genética , Genómica , Seudogenes/genética , Animales , Biología Computacional , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Programas Informáticos
4.
Nucleic Acids Res ; 46(D1): D221-D228, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29126148

RESUMEN

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.


Asunto(s)
Secuencia de Consenso , Bases de Datos Genéticas , Sistemas de Lectura Abierta , Animales , Curaduría de Datos/métodos , Curaduría de Datos/normas , Bases de Datos Genéticas/normas , Guías como Asunto , Humanos , Ratones , Anotación de Secuencia Molecular , National Library of Medicine (U.S.) , Estados Unidos , Interfaz Usuario-Computador
5.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24217909

RESUMEN

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Asunto(s)
Bases de Datos Genéticas , Proteínas/genética , Animales , Exones , Genómica , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Análisis de Secuencia
6.
NPJ Genom Med ; 4: 31, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31814998

RESUMEN

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

7.
Insect Biochem Mol Biol ; 34(5): 451-8, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15110866

RESUMEN

Inducible expression systems have proven to be of major interest when analysing the function of specific genes or when expressing cytotoxic proteins. In an effort to develop inducible switches allowing for flexible fine-tuning of gene expression levels in insect cells, we have compared the induction capacities of two Drosophila minimal promoters when linked to four consecutive ecdysone response elements. These minimal promoters, either containing a TATA-box or a downstream promoter element, drove the expression of a luciferase reporter gene. Potent induction capacities were observed with the insect moulting hormone, 20-hydroxyecdysone, and with ponasterone A, a plant ecdysteroid. The developed inducible switches further expand the repertoire of molecular tools for functional expression of proteins of interest in insect cells. In addition, the combination of an ecdysone switch with promoters that possess different structural elements can provide novel insights into ecdysteroid-induced transcription in an insect cell line.


Asunto(s)
Drosophila/genética , Ecdisterona/análogos & derivados , Ecdisterona/farmacología , Regulación de la Expresión Génica/genética , Regiones Promotoras Genéticas/genética , Animales , Secuencia de Bases , Línea Celular , Proteínas de Unión al ADN/biosíntesis , Proteínas de Unión al ADN/genética , Relación Dosis-Respuesta a Droga , Drosophila/citología , Proteínas de Drosophila , Regulación de la Expresión Génica/efectos de los fármacos , Genes Reporteros , Proteínas HSP70 de Choque Térmico/genética , Proteínas de Insectos/biosíntesis , Proteínas de Insectos/genética , Luciferasas/genética , Luciferasas/metabolismo , Datos de Secuencia Molecular , Regiones Promotoras Genéticas/efectos de los fármacos , Receptores de Esteroides/biosíntesis , Receptores de Esteroides/genética , Factores de Transcripción/biosíntesis , Factores de Transcripción/genética , Transfección
8.
Database (Oxford) ; 2012: bas008, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22434842

RESUMEN

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL: http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi.


Asunto(s)
Secuencia de Consenso , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Animales , Humanos , Ratones
9.
Science ; 335(6070): 823-8, 2012 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-22344438

RESUMEN

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.


Asunto(s)
Variación Genética , Genoma Humano , Proteínas/genética , Enfermedad/genética , Expresión Génica , Frecuencia de los Genes , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Selección Genética
10.
Genome Res ; 19(7): 1316-23, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19498102

RESUMEN

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.


Asunto(s)
Secuencia de Consenso , Genoma , Sistemas de Lectura Abierta/genética , Animales , Humanos , Ratones , Alineación de Secuencia
11.
Biochem Biophys Res Commun ; 320(2): 318-24, 2004 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-15219829

RESUMEN

Activation or inhibition of the cyclic AMP (cAMP)-protein kinase A (PKA) pathway can ultimately regulate the transcription of a variety of genes. In vertebrates, the best characterized nuclear targets of PKA are the 'cAMP response element' (CRE) binding proteins (CREB). Differences in the transcriptional response to this pathway between cells and tissues can be based on the presence of distinct CREB isoforms. In this context, we have now investigated the presence of different dCREB transcripts in a stable, embryonic insect cell line, i.e., Drosophila Schneider 2 (S2) cells. In addition, we have studied the possible effect of cellular cAMP- and Ca2+ increases on the expression of a luciferase reporter in cells transfected with a CRE-containing reporter gene construct. In combination with recent data from the literature, our results indicate that the regulation of CRE-dependent gene expression shows some important differences between insects and vertebrates.


Asunto(s)
Proteína Receptora de AMP Cíclico/metabolismo , Drosophila/citología , Secuencia de Aminoácidos , Animales , Proteínas Portadoras , Línea Celular , Proteína Receptora de AMP Cíclico/química , Proteína Receptora de AMP Cíclico/genética , Cartilla de ADN , ADN Complementario , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa , Homología de Secuencia de Aminoácido
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA