Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
bioRxiv ; 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38617209

RESUMO

Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies.

2.
Nature ; 622(7981): 41-47, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37794265

RESUMO

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Assuntos
Genes , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas , Humanos , Genoma Humano/genética , Anotação de Sequência Molecular/normas , Anotação de Sequência Molecular/tendências , Isoformas de Proteínas/genética , Projeto Genoma Humano , Pseudogenes , RNA/genética
3.
ArXiv ; 2023 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-36994150

RESUMO

Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.

4.
Hum Mol Genet ; 32(10): 1753-1763, 2023 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-36715146

RESUMO

Pathogenic variations in the sodium voltage-gated channel alpha subunit 1 (SCN1A) gene are responsible for multiple epilepsy phenotypes, including Dravet syndrome, febrile seizures (FS) and genetic epilepsy with FS plus. Phenotypic heterogeneity is a hallmark of SCN1A-related epilepsies, the causes of which are yet to be clarified. Genetic variation in the non-coding regulatory regions of SCN1A could be one potential causal factor. However, a comprehensive understanding of the SCN1A regulatory landscape is currently lacking. Here, we summarized the current state of knowledge of SCN1A regulation, providing details on its promoter and enhancer regions. We then integrated currently available data on SCN1A promoters by extracting information related to the SCN1A locus from genome-wide repositories and clearly defined the promoter and enhancer regions of SCN1A. Further, we explored the cellular specificity of differential SCN1A promoter usage. We also reviewed and integrated the available human brain-derived enhancer databases and mouse-derived data to provide a comprehensive computationally developed summary of SCN1A brain-active enhancers. By querying genome-wide data repositories, extracting SCN1A-specific data and integrating the different types of independent evidence, we created a comprehensive catalogue that better defines the regulatory landscape of SCN1A, which could be used to explore the role of SCN1A regulatory regions in disease.


Assuntos
Epilepsias Mioclônicas , Epilepsia , Convulsões Febris , Humanos , Camundongos , Animais , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Epilepsias Mioclônicas/genética , Epilepsia/genética , Regiões Promotoras Genéticas , Fenótipo , Convulsões Febris/genética , Mutação
5.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36420896

RESUMO

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
Biologia Computacional , Genoma Humano , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Biologia Computacional/métodos , Genoma Humano/genética , Transcriptoma/genética , Perfilação da Expressão Gênica , Bases de Dados Genéticas
7.
Acta Neuropathol ; 144(1): 107-127, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35551471

RESUMO

Mesial temporal lobe epilepsy with hippocampal sclerosis and a history of febrile seizures is associated with common variation at rs7587026, located in the promoter region of SCN1A. We sought to explore possible underlying mechanisms. SCN1A expression was analysed in hippocampal biopsy specimens of individuals with mesial temporal lobe epilepsy with hippocampal sclerosis who underwent surgical treatment, and hippocampal neuronal cell loss was quantitatively assessed using immunohistochemistry. In healthy individuals, hippocampal volume was measured using MRI. Analyses were performed stratified by rs7587026 type. To study the functional consequences of increased SCN1A expression, we generated, using transposon-mediated bacterial artificial chromosome transgenesis, a zebrafish line expressing exogenous scn1a, and performed EEG analysis on larval optic tecta at 4 day post-fertilization. Finally, we used an in vitro promoter analysis to study whether the genetic motif containing rs7587026 influences promoter activity. Hippocampal SCN1A expression differed by rs7587026 genotype (Kruskal-Wallis test P = 0.004). Individuals homozygous for the minor allele showed significantly increased expression compared to those homozygous for the major allele (Dunn's test P = 0.003), and to heterozygotes (Dunn's test P = 0.035). No statistically significant differences in hippocampal neuronal cell loss were observed between the three genotypes. Among 597 healthy participants, individuals homozygous for the minor allele at rs7587026 displayed significantly reduced mean hippocampal volume compared to major allele homozygotes (Cohen's D = - 0.28, P = 0.02), and to heterozygotes (Cohen's D = - 0.36, P = 0.009). Compared to wild type, scn1lab-overexpressing zebrafish larvae exhibited more frequent spontaneous seizures [one-way ANOVA F(4,54) = 6.95 (P < 0.001)]. The number of EEG discharges correlated with the level of scn1lab overexpression [one-way ANOVA F(4,15) = 10.75 (P < 0.001]. Finally, we showed that a 50 bp promoter motif containing rs7587026 exerts a strong regulatory role on SCN1A expression, though we could not directly link this to rs7587026 itself. Our results develop the mechanistic link between rs7587026 and mesial temporal lobe epilepsy with hippocampal sclerosis and a history of febrile seizures. Furthermore, we propose that quantitative precision may be important when increasing SCN1A expression in current strategies aiming to treat seizures in conditions involving SCN1A haploinsufficiency, such as Dravet syndrome.


Assuntos
Epilepsia do Lobo Temporal , Epilepsia , Canal de Sódio Disparado por Voltagem NAV1.1/metabolismo , Convulsões Febris , Proteínas de Peixe-Zebra/metabolismo , Animais , Epilepsia/genética , Epilepsia do Lobo Temporal/genética , Genômica , Gliose/patologia , Hipocampo/patologia , Humanos , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Esclerose/patologia , Convulsões Febris/complicações , Convulsões Febris/genética , Peixe-Zebra
8.
Nature ; 604(7905): 310-315, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35388217

RESUMO

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Genoma , Humanos , Disseminação de Informação , Anotação de Sequência Molecular , National Library of Medicine (U.S.) , Estados Unidos
10.
Neuropathol Appl Neurobiol ; 48(3): e12775, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34820881

RESUMO

Non-coding DNA (ncDNA) refers to the portion of the genome that does not code for proteins and accounts for the greatest physical proportion of the human genome. ncDNA includes sequences that are transcribed into RNA molecules, such as ribosomal RNAs (rRNAs), microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and un-transcribed sequences that have regulatory functions, including gene promoters and enhancers. Variation in non-coding regions of the genome have an established role in human disease, with growing evidence from many areas, including several cancers, Parkinson's disease and autism. Here, we review the features and functions of the regulatory elements that are present in the non-coding genome and the role that these regions have in human disease. We then review the existing research in epilepsy and emphasise the potential value of further exploring non-coding regulatory elements in epilepsy. In addition, we outline the most widely used techniques for recognising regulatory elements throughout the genome, current methodologies for investigating variation and the main challenges associated with research in the field of non-coding DNA.


Assuntos
Epilepsia , MicroRNAs , RNA Longo não Codificante , Epilepsia/genética , Genoma , Humanos , MicroRNAs/genética , RNA Longo não Codificante/genética
11.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34634797

RESUMO

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Assuntos
COVID-19/virologia , Bases de Dados Genéticas , SARS-CoV-2/genética , Navegador , Coronaviridae/genética , Variação Genética , Genoma Viral , Humanos , Anotação de Sequência Molecular
12.
Mol Genet Genomic Med ; 9(12): e1786, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34435752

RESUMO

BACKGROUND: Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. METHOD: Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. RESULTS: The survey had 788 responses from 32 different countries, the results of which we report here. CONCLUSIONS: We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.


Assuntos
Biomarcadores , Biologia Computacional , Genômica , RNA Mensageiro/genética , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Humanos , Software , Navegador
13.
Nat Commun ; 12(1): 463, 2021 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-33469025

RESUMO

Splicing varies across brain regions, but the single-cell resolution of regional variation is unclear. We present a single-cell investigation of differential isoform expression (DIE) between brain regions using single-cell long-read sequencing in mouse hippocampus and prefrontal cortex in 45 cell types at postnatal day 7 ( www.isoformAtlas.com ). Isoform tests for DIE show better performance than exon tests. We detect hundreds of DIE events traceable to cell types, often corresponding to functionally distinct protein isoforms. Mostly, one cell type is responsible for brain-region specific DIE. However, for fewer genes, multiple cell types influence DIE. Thus, regional identity can, although rarely, override cell-type specificity. Cell types indigenous to one anatomic structure display distinctive DIE, e.g. the choroid plexus epithelium manifests distinct transcription-start-site usage. Spatial transcriptomics and long-read sequencing yield a spatially resolved splicing map. Our methods quantify isoform expression with cell-type and spatial resolution and it contributes to further our understanding of how the brain integrates molecular and cellular complexity.


Assuntos
Processamento Alternativo/fisiologia , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Hipocampo/metabolismo , Córtex Pré-Frontal/metabolismo , Isoformas de Proteínas/metabolismo , Animais , Animais Recém-Nascidos , Biologia Computacional , Feminino , Hipocampo/citologia , Hipocampo/crescimento & desenvolvimento , Camundongos , Modelos Animais , Córtex Pré-Frontal/citologia , Córtex Pré-Frontal/crescimento & desenvolvimento , Isoformas de Proteínas/análise , Isoformas de Proteínas/genética , Análise de Célula Única/métodos , Análise Espacial
14.
Haematologica ; 106(10): 2613-2623, 2021 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-32703790

RESUMO

Transcriptional profiling of hematopoietic cell subpopulations has helped to characterize the developmental stages of the hematopoietic system and the molecular bases of malignant and non-malignant blood diseases. Previously, only the genes targeted by expression microarrays could be profiled genome-wide. High-throughput RNA sequencing, however, encompasses a broader repertoire of RNA molecules, without restriction to previously annotated genes. We analyzed the BLUEPRINT consortium RNA-sequencing data for mature hematopoietic cell types. The data comprised 90 total RNA-sequencing samples, each composed of one of 27 cell types, and 32 small RNA-sequencing samples, each composed of one of 11 cell types. We estimated gene and isoform expression levels for each cell type using existing annotations from Ensembl. We then used guided transcriptome assembly to discover unannotated transcripts. We identified hundreds of novel non-coding RNA genes and showed that the majority have cell type-dependent expression. We also characterized the expression of circular RNA and found that these are also cell type-specific. These analyses refine the active transcriptional landscape of mature hematopoietic cells, highlight abundant genes and transcriptional isoforms for each blood cell type, and provide a valuable resource for researchers of hematologic development and diseases. Finally, we made the data accessible via a web-based interface: https://blueprint.haem.cam.ac.uk/bloodatlas/.


Assuntos
RNA Longo não Codificante , Transcriptoma , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , RNA Circular , RNA Longo não Codificante/genética , Análise de Sequência de RNA
15.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética
16.
Nat Commun ; 11(1): 3695, 2020 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-32728065

RESUMO

Pseudogenes are ideal markers of genome remodelling. In turn, the mouse is an ideal platform for studying them, particularly with the recent availability of strain-sequencing and transcriptional data. Here, combining both manual curation and automatic pipelines, we present a genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains (available via the mouse.pseudogene.org resource). We also annotate 165 unitary pseudogenes in mouse, and 303, in human. The overall pseudogene repertoire in mouse is similar to that in human in terms of size, biotype distribution, and family composition (e.g. with GAPDH and ribosomal proteins being the largest families). Notable differences arise in the pseudogene age distribution, with multiple retro-transpositional bursts in mouse evolutionary history and only one in human. Furthermore, in each strain about a fifth of all pseudogenes are unique, reflecting strain-specific evolution. Finally, we find that ~15% of the mouse pseudogenes are transcribed, and that highly transcribed parent genes tend to give rise to many processed pseudogenes.


Assuntos
Pseudogenes/genética , Transcrição Gênica , Animais , Sequência Conservada/genética , Evolução Molecular , Ontologia Genética , Genoma , Humanos , Camundongos Endogâmicos C57BL , Anotação de Sequência Molecular , Especificidade da Espécie
17.
Nature ; 583(7818): 693-698, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728248

RESUMO

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.


Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Metilação de DNA , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Genômica/normas , Genômica/tendências , Histonas/metabolismo , Humanos , Camundongos , Anotação de Sequência Molecular/normas , Controle de Qualidade , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo
18.
Annu Rev Genomics Hum Genet ; 21: 55-79, 2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32421357

RESUMO

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.


Assuntos
Genoma Humano , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/normas , Humanos
19.
BMC Genomics ; 21(1): 196, 2020 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-32126975

RESUMO

BACKGROUND: Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with 874 in human and 1483 loci in mouse (including pseudogenes). The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. RESULTS: Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon. CONCLUSIONS: This work provides the most comprehensive curation effort of the human and mouse OR gene repertoires to date. The complete annotation has been integrated into the GENCODE reference gene set, for immediate availability to the research community.


Assuntos
Sequência Conservada , Éxons/genética , Locos de Características Quantitativas , Receptores Odorantes/genética , Animais , Curadoria de Dados/métodos , Bases de Dados Genéticas , Loci Gênicos , Genoma Humano , Humanos , Camundongos , Pseudogenes
20.
NPJ Genom Med ; 4: 31, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31814998

RESUMO

The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA