Pesquisa | Portal Regional da BVS

Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants in SCN1A.

Steward, Charles A; Roovers, Jolien; Suner, Marie-Marthe; Gonzalez, Jose M; Uszczynska-Ratajczak, Barbara; Pervouchine, Dmitri; Fitzgerald, Stephen; Viola, Margarida; Stamberger, Hannah; Hamdan, Fadi F; Ceulemans, Berten; Leroy, Patricia; Nava, Caroline; Lepine, Anne; Tapanari, Electra; Keiller, Don; Abbs, Stephen; Sanchis-Juan, Alba; Grozeva, Detelina; Rogers, Anthony S; Diekhans, Mark; Guigó, Roderic; Petryszak, Robert; Minassian, Berge A; Cavalleri, Gianpiero; Vitsios, Dimitrios; Petrovski, Slavé; Harrow, Jennifer; Flicek, Paul; Lucy Raymond, F; Lench, Nicholas J; Jonghe, Peter De; Mudge, Jonathan M; Weckhuysen, Sarah; Sisodiya, Sanjay M; Frankish, Adam.

NPJ Genom Med ; 4: 31, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31814998

RESUMO

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.

Kersey, Paul Julian; Allen, James E; Allot, Alexis; Barba, Matthieu; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Grabmueller, Christoph; Kumar, Navin; Liu, Zicheng; Maurel, Thomas; Moore, Ben; McDowall, Mark D; Maheswari, Uma; Naamati, Guy; Newman, Victoria; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Perry, Emily; Russell, Matthew; Sparrow, Helen; Tapanari, Electra; Taylor, Kieron; Vullo, Alessandro; Williams, Gareth; Zadissia, Amonida; Olson, Andrew; Stein, Joshua; Wei, Sharon; Tello-Ruiz, Marcela; Ware, Doreen; Luciani, Aurelien; Potter, Simon; Finn, Robert D; Urban, Martin; Hammond-Kosack, Kim E; Bolser, Dan M; De Silva, Nishadi; Howe, Kevin L; Langridge, Nicholas; Maslen, Gareth; Staines, Daniel Michael; Yates, Andrew.

Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29092050

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.

Assuntos

Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Eucariotos/genética , Genômica , Sequência de Aminoácidos , Animais , Sequência de Bases , Mineração de Dados , Previsões , Genoma , Anotação de Sequência Molecular , RNA/genética , Interface Usuário-Computador

Gramene 2018: unifying comparative genomics and pathway resources for plant research.

Tello-Ruiz, Marcela K; Naithani, Sushma; Stein, Joshua C; Gupta, Parul; Campbell, Michael; Olson, Andrew; Wei, Sharon; Preece, Justin; Geniza, Matthew J; Jiao, Yinping; Lee, Young Koung; Wang, Bo; Mulvaney, Joseph; Chougule, Kapeel; Elser, Justin; Al-Bader, Noor; Kumari, Sunita; Thomason, James; Kumar, Vivek; Bolser, Daniel M; Naamati, Guy; Tapanari, Electra; Fonseca, Nuno; Huerta, Laura; Iqbal, Haider; Keays, Maria; Munoz-Pomer Fuentes, Alfonso; Tang, Amy; Fabregat, Antonio; D'Eustachio, Peter; Weiser, Joel; Stein, Lincoln D; Petryszak, Robert; Papatheodorou, Irene; Kersey, Paul J; Lockhart, Patti; Taylor, Crispin; Jaiswal, Pankaj; Ware, Doreen.

Nucleic Acids Res ; 46(D1): D1181-D1189, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29165610

RESUMO

Gramene (http://www.gramene.org) is a knowledgebase for comparative functional analysis in major crops and model plant species. The current release, #54, includes over 1.7 million genes from 44 reference genomes, most of which were organized into 62,367 gene families through orthologous and paralogous gene classification, whole-genome alignments, and synteny. Additional gene annotations include ontology-based protein structure and function; genetic, epigenetic, and phenotypic diversity; and pathway associations. Gramene's Plant Reactome provides a knowledgebase of cellular-level plant pathway networks. Specifically, it uses curated rice reference pathways to derive pathway projections for an additional 66 species based on gene orthology, and facilitates display of gene expression, gene-gene interactions, and user-defined omics data in the context of these pathways. As a community portal, Gramene integrates best-of-class software and infrastructure components including the Ensembl genome browser, Reactome pathway browser, and Expression Atlas widgets, and undergoes periodic data and software upgrades. Via powerful, intuitive search interfaces, users can easily query across various portals and interactively analyze search results by clicking on diverse features such as genomic context, highly augmented gene trees, gene expression anatomograms, associated pathways, and external informatics resources. All data in Gramene are accessible through both visual and programmatic interfaces.

Assuntos

Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genômica/métodos , Bases de Conhecimento , Plantas/genética , Epigênese Genética , Ontologia Genética , Pesquisa em Genética , Variação Genética , Genoma de Planta , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/metabolismo , Software , Interface Usuário-Computador

Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq).

Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Santoyo-Lopez, Javier; Gonzalez, Jose Manuel; Tapanari, Electra; Mudge, Jonathan M; Steward, Charles A; Wilming, Laurens; Tanzer, Andrea; Howald, Cédric; Chrast, Jacqueline; Vela-Boza, Alicia; Rueda, Antonio; Lopez-Domingo, Francisco J; Dopazo, Joaquin; Reymond, Alexandre; Guigó, Roderic; Harrow, Jennifer.

Nat Commun ; 7: 12339, 2016 08 17.

Artigo em Inglês | MEDLINE | ID: mdl-27531712

RESUMO

Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reação em Cadeia da Polimerase/métodos , RNA Longo não Codificante/genética , Análise de Sequência de RNA/métodos , Éxons/genética , Loci Gênicos , Humanos , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Estudo de Prova de Conceito , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Sítios de Splice de RNA/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcriptoma/genética

Ensembl Genomes 2016: more genomes, more complexity.

Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M.

Nucleic Acids Res ; 44(D1): D574-80, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26578574

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.

Assuntos

Bases de Dados Genéticas , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Invertebrados/genética , Animais , Diploide , Eucariotos/genética , Variação Genética , Genoma , Poliploidia , Alinhamento de Sequência

GENCODE: the reference human genome annotation for The ENCODE Project.

Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J.

Genome Res ; 22(9): 1760-74, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22955987

RESUMO

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

Assuntos

Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não Traduzidas

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA