Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38269623

RESUMEN

MOTIVATION: In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants. However, current phasing tools typically only phase small variants, leaving larger variants unphased. RESULTS: We developed HiPhase, a tool that jointly phases SNVs, indels, structural, and tandem repeat variants. The main benefits of HiPhase are (i) dual mode allele assignment for detecting large variants, (ii) a novel application of the A*-algorithm to phasing, and (iii) logic allowing phase blocks to span breaks caused by alignment issues around reference gaps and homozygous deletions. In our assessment, HiPhase produced an average phase block NG50 of 480 kb with 929 switchflip errors and fully phased 93.8% of genes, improving over the current state of the art. Additionally, HiPhase jointly phases SNVs, indels, structural, and tandem repeat variants and includes innate multi-threading, statistics gathering, and concurrent phased alignment output generation. AVAILABILITY AND IMPLEMENTATION: HiPhase is available as source code and a pre-compiled Linux binary with a user guide at https://github.com/PacificBiosciences/HiPhase.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Algoritmos , Haplotipos , Secuencias Repetidas en Tándem
2.
Am J Med Genet A ; 188(7): 2071-2081, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35366058

RESUMEN

Currently, protein-coding de novo variants and large copy number variants have been identified as important for ~30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, epileptic encephalopathy, cognitive impairment, and mild dysmorphic features (two affected female siblings, unaffected parents, and one unaffected male sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment revealed the variant was not in controls, highly conserved, and predicted damaging. This specific missense variant (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense variants have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10-5 ). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available.


Asunto(s)
Trastorno Autístico , Epilepsia , Canales de Potasio Shaw , Trastorno Autístico/genética , Niño , Epilepsia/genética , Femenino , Células Germinativas , Humanos , Masculino , Mosaicismo , Mutación Missense , Canales de Potasio Shaw/genética
3.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32559231

RESUMEN

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Heurística , Humanos , Mutación INDEL
4.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31711268

RESUMEN

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Asunto(s)
Biomarcadores/análisis , Variación Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular , Embarazo
5.
Nucleic Acids Res ; 46(18): 9299-9308, 2018 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-30137416

RESUMEN

Genetic variation in cis-regulatory elements is thought to be a major driving force in morphological and physiological changes. However, identifying transcription factor binding events that code for complex traits remains a challenge, motivating novel means of detecting putatively important binding events. Using a curated set of 1154 high-quality transcription factor motifs, we demonstrate that independently eroded binding sites are enriched for independently lost traits in three distinct pairs of placental mammals. We show that these independently eroded events pinpoint the loss of hindlimbs in dolphin and manatee, degradation of vision in naked mole-rat and star-nosed mole, and the loss of external testes in white rhinoceros and Weddell seal. We additionally show that our method may also be utilized with more than two species. Our study exhibits a novel methodology to detect cis-regulatory mutations which help explain a portion of the molecular mechanism underlying complex trait formation and loss.


Asunto(s)
Evolución Molecular , Motivos de Nucleótidos/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/genética , Visión Ocular/genética , Animales , Sitios de Unión/genética , Delfines/genética , Delfines/fisiología , Miembro Posterior/fisiología , Masculino , Mamíferos/genética , Mamíferos/fisiología , Ratas Topo/genética , Ratas Topo/fisiología , Unión Proteica/genética , Testículo/fisiología , Trichechus/genética , Trichechus/fisiología , Visión Ocular/fisiología
6.
Genet Med ; 21(2): 464-470, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-29997393

RESUMEN

PURPOSE: Exome sequencing and diagnosis is beginning to spread across the medical establishment. The most time-consuming part of genome-based diagnosis is the manual step of matching the potentially long list of patient candidate genes to patient phenotypes to identify the causative disease. METHODS: We introduce Phrank (for phenotype ranking), an information theory-inspired method that utilizes a Bayesian network to prioritize candidate diseases or genes, as a stand-alone module that can be run with any underlying knowledgebase and any variant filtering scheme. RESULTS: Phrank outperforms existing methods at ranking the causative disease or gene when applied to 169 real patient exomes with Mendelian diagnoses. Phrank's greatest improvement is in disease space, where across all 169 patients it ranks only 3 diseases on average ahead of the true diagnosis, whereas Phenomizer ranks 32 diseases ahead of the causal one. CONCLUSIONS: Using Phrank to rank all patient candidate genes or diseases, as they start working through a new case, will save the busy clinician much time in deriving a genetic diagnosis.


Asunto(s)
Diagnóstico por Computador , Enfermedades Genéticas Congénitas/diagnóstico , Pruebas Genéticas , Fenotipo , Programas Informáticos , Benchmarking , Biología Computacional/métodos , Exoma , Humanos , Bases del Conocimiento , Patología Molecular/métodos
7.
Genet Med ; 20(1): 159-163, 2018 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-28640241

RESUMEN

PurposeCurrent clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.MethodsWe performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.ResultsThis LRS approach yielded 6,971 deletions and 6,821 insertions > 50 bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184 bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.ConclusionThis first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.


Asunto(s)
Estudios de Asociación Genética , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Variación Genética , Genoma Humano , Genómica , Análisis de Secuencia de ADN , Niño , Subunidad RIalfa de la Proteína Quinasa Dependiente de AMP Cíclico/genética , Ecocardiografía , Genómica/métodos , Humanos , Masculino , Fenotipo , Análisis de Secuencia de ADN/métodos , Eliminación de Secuencia
8.
Am J Med Genet A ; 176(4): 1030-1036, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29575631

RESUMEN

Robinow syndrome (RS) is a well-recognized Mendelian disorder known to demonstrate both autosomal dominant and autosomal recessive inheritance. Typical manifestations include short stature, characteristic facies, and skeletal anomalies. Recessive inheritance has been associated with mutations in ROR2 while dominant inheritance has been observed for mutations in WNT5A, DVL1, and DVL3. Through trio whole genome sequencing, we identified a homozygous frameshifting single nucleotide deletion in WNT5A in a previously reported, deceased infant with a unique constellation of features comprising a 46,XY disorder of sex development with multiple congenital malformations including congenital diaphragmatic hernia, ambiguous genitalia, dysmorphic facies, shortened long bones, adactyly, and ventricular septal defect. The parents, who are both heterozygous for the deletion, appear clinically unaffected. In conjunction with published observations of Wnt5a double knockout mice, we provide evidence for the possibility of autosomal recessive inheritance in association with WNT5A loss-of-function mutations in RS.


Asunto(s)
Alelos , Anomalías Craneofaciales/diagnóstico , Anomalías Craneofaciales/genética , Enanismo/diagnóstico , Enanismo/genética , Deformidades Congénitas de las Extremidades/diagnóstico , Deformidades Congénitas de las Extremidades/genética , Mutación con Pérdida de Función , Fenotipo , Anomalías Urogenitales/diagnóstico , Anomalías Urogenitales/genética , Proteína Wnt-5a/genética , Animales , Modelos Animales de Enfermedad , Femenino , Mutación del Sistema de Lectura , Frecuencia de los Genes , Estudios de Asociación Genética , Homocigoto , Humanos , Lactante , Ratones , Ratones Noqueados , Mutación Puntual , Índice de Severidad de la Enfermedad , Evaluación de Síntomas , Ultrasonografía , Secuenciación Completa del Genoma
9.
Brain ; 140(10): 2610-2622, 2017 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-28969385

RESUMEN

Mutations of genes within the phosphatidylinositol-3-kinase (PI3K)-AKT-MTOR pathway are well known causes of brain overgrowth (megalencephaly) as well as segmental cortical dysplasia (such as hemimegalencephaly, focal cortical dysplasia and polymicrogyria). Mutations of the AKT3 gene have been reported in a few individuals with brain malformations, to date. Therefore, our understanding regarding the clinical and molecular spectrum associated with mutations of this critical gene is limited, with no clear genotype-phenotype correlations. We sought to further delineate this spectrum, study levels of mosaicism and identify genotype-phenotype correlations of AKT3-related disorders. We performed targeted sequencing of AKT3 on individuals with these phenotypes by molecular inversion probes and/or Sanger sequencing to determine the type and level of mosaicism of mutations. We analysed all clinical and brain imaging data of mutation-positive individuals including neuropathological analysis in one instance. We performed ex vivo kinase assays on AKT3 engineered with the patient mutations and examined the phospholipid binding profile of pleckstrin homology domain localizing mutations. We identified 14 new individuals with AKT3 mutations with several phenotypes dependent on the type of mutation and level of mosaicism. Our comprehensive clinical characterization, and review of all previously published patients, broadly segregates individuals with AKT3 mutations into two groups: patients with highly asymmetric cortical dysplasia caused by the common p.E17K mutation, and patients with constitutional AKT3 mutations exhibiting more variable phenotypes including bilateral cortical malformations, polymicrogyria, periventricular nodular heterotopia and diffuse megalencephaly without cortical dysplasia. All mutations increased kinase activity, and pleckstrin homology domain mutants exhibited enhanced phospholipid binding. Overall, our study shows that activating mutations of the critical AKT3 gene are associated with a wide spectrum of brain involvement ranging from focal or segmental brain malformations (such as hemimegalencephaly and polymicrogyria) predominantly due to mosaic AKT3 mutations, to diffuse bilateral cortical malformations, megalencephaly and heterotopia due to constitutional AKT3 mutations. We also provide the first detailed neuropathological examination of a child with extreme megalencephaly due to a constitutional AKT3 mutation. This child has one of the largest documented paediatric brain sizes, to our knowledge. Finally, our data show that constitutional AKT3 mutations are associated with megalencephaly, with or without autism, similar to PTEN-related disorders. Recognition of this broad clinical and molecular spectrum of AKT3 mutations is important for providing early diagnosis and appropriate management of affected individuals, and will facilitate targeted design of future human clinical trials using PI3K-AKT pathway inhibitors.


Asunto(s)
Discapacidades del Desarrollo/genética , Megalencefalia/genética , Mutación/genética , Proteínas Proto-Oncogénicas c-akt/genética , Encéfalo/diagnóstico por imagen , Niño , Discapacidades del Desarrollo/diagnóstico por imagen , Discapacidades del Desarrollo/patología , Femenino , Estudios de Asociación Genética , Células HEK293 , Humanos , Inmunoprecipitación , Imagen por Resonancia Magnética , Masculino , Megalencefalia/diagnóstico por imagen , Megalencefalia/patología , Mutagénesis Sitio-Dirigida/métodos , Fosfatidilinositoles/metabolismo , Transfección
10.
Genome Res ; 24(9): 1504-16, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24963153

RESUMEN

Microbiota regulate intestinal physiology by modifying host gene expression along the length of the intestine, but the underlying regulatory mechanisms remain unresolved. Transcriptional specificity occurs through interactions between transcription factors (TFs) and cis-regulatory regions (CRRs) characterized by nucleosome-depleted accessible chromatin. We profiled transcriptome and accessible chromatin landscapes in intestinal epithelial cells (IECs) from mice reared in the presence or absence of microbiota. We show that regional differences in gene transcription along the intestinal tract were accompanied by major alterations in chromatin accessibility. Surprisingly, we discovered that microbiota modify host gene transcription in IECs without significantly impacting the accessible chromatin landscape. Instead, microbiota regulation of host gene transcription might be achieved by differential expression of specific TFs and enrichment of their binding sites in nucleosome-depleted CRRs near target genes. Our results suggest that the chromatin landscape in IECs is preprogrammed by the host in a region-specific manner to permit responses to microbiota through binding of open CRRs by specific TFs.


Asunto(s)
Ensamble y Desensamble de Cromatina , Mucosa Intestinal/metabolismo , Microbiota , Transcripción Genética , Animales , Mucosa Intestinal/microbiología , Ratones , Ratones Endogámicos C57BL , Especificidad de Órganos , Regiones Promotoras Genéticas , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Transcriptoma
11.
Genet Med ; 19(2): 209-214, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27441994

RESUMEN

PURPOSE: Clinical exome sequencing is nondiagnostic for about 75% of patients evaluated for a possible Mendelian disorder. We examined the ability of systematic reevaluation of exome data to establish additional diagnoses. METHODS: The exome and phenotypic data of 40 individuals with previously nondiagnostic clinical exomes were reanalyzed with current software and literature. RESULTS: A definitive diagnosis was identified for 4 of 40 participants (10%). In these cases the causative variant is de novo and in a relevant autosomal-dominant disease gene. The literature to tie the causative genes to the participants' phenotypes was weak, nonexistent, or not readily located at the time of the initial clinical exome reports. At the time of diagnosis by reanalysis, the supporting literature was 1 to 3 years old. CONCLUSION: Approximately 250 gene-disease and 9,200 variant-disease associations are reported annually. This increase in information necessitates regular reevaluation of nondiagnostic exomes. To be practical, systematic reanalysis requires further automation and more up-to-date variant databases. To maximize the diagnostic yield of exome sequencing, providers should periodically request reanalysis of nondiagnostic exomes. Accordingly, policies regarding reanalysis should be weighed in combination with factors such as cost and turnaround time when selecting a clinical exome laboratory.Genet Med 19 2, 209-214.


Asunto(s)
Secuenciación del Exoma/normas , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Genética Médica/normas , Niño , Preescolar , Exoma/genética , Femenino , Enfermedades Genéticas Congénitas/patología , Humanos , Lactante , Masculino , Mutación , Linaje , Análisis de Secuencia de ADN
12.
Nature ; 471(7337): 216-9, 2011 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-21390129

RESUMEN

Humans differ from other animals in many aspects of anatomy, physiology, and behaviour; however, the genotypic basis of most human-specific traits remains unknown. Recent whole-genome comparisons have made it possible to identify genes with elevated rates of amino acid change or divergent expression in humans, and non-coding sequences with accelerated base pair changes. Regulatory alterations may be particularly likely to produce phenotypic effects while preserving viability, and are known to underlie interesting evolutionary differences in other species. Here we identify molecular events particularly likely to produce significant regulatory changes in humans: complete deletion of sequences otherwise highly conserved between chimpanzees and other mammals. We confirm 510 such deletions in humans, which fall almost exclusively in non-coding regions and are enriched near genes involved in steroid hormone signalling and neural function. One deletion removes a sensory vibrissae and penile spine enhancer from the human androgen receptor (AR) gene, a molecular change correlated with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. Another deletion removes a forebrain subventricular zone enhancer near the tumour suppressor gene growth arrest and DNA-damage-inducible, gamma (GADD45G), a loss correlated with expansion of specific brain regions in humans. Deletions of tissue-specific enhancers may thus accompany both loss and gain traits in the human lineage, and provide specific examples of the kinds of regulatory alterations and inactivation events long proposed to have an important role in human evolutionary divergence.


Asunto(s)
Evolución Biológica , ADN/genética , Genoma Humano/genética , Características Humanas , Secuencias Reguladoras de Ácidos Nucleicos/genética , Eliminación de Secuencia/genética , Animales , Encéfalo/anatomía & histología , Encéfalo/metabolismo , Cromosomas de los Mamíferos/genética , Secuencia Conservada/genética , ADN Intergénico/genética , Elementos de Facilitación Genéticos/genética , Evolución Molecular , Genes Supresores de Tumor , Humanos , Masculino , Ratones , Especificidad de Órganos , Pan troglodytes/genética , Pene/anatomía & histología , Pene/metabolismo , Especificidad de la Especie , Transgenes/genética
13.
Genome Res ; 23(5): 889-904, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23382538

RESUMEN

The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.


Asunto(s)
Sitios de Unión/genética , Biología Computacional , Programas Informáticos , Factores de Transcripción/genética , Algoritmos , Animales , Secuencia de Bases , Proteínas de Unión al ADN/genética , Genoma , Humanos , Ratones , Unión Proteica/genética , Secuencias Reguladoras de Ácidos Nucleicos
14.
PLoS Genet ; 9(8): e1003728, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24009522

RESUMEN

Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.


Asunto(s)
Elementos de Facilitación Genéticos , Evolución Molecular , Neocórtex/crecimiento & desarrollo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Secuencia de Bases , Secuencia Conservada/genética , Regulación del Desarrollo de la Expresión Génica , Humanos , Ratones , Neocórtex/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Regiones Promotoras Genéticas , Factores de Transcripción/genética , Pez Cebra/genética , Pez Cebra/crecimiento & desarrollo
15.
Genome Res ; 22(6): 1059-68, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22442009

RESUMEN

Enhancers are essential gene regulatory elements whose alteration can lead to morphological differences between species, developmental abnormalities, and human disease. Current strategies to identify enhancers focus primarily on noncoding sequences and tend to exclude protein coding sequences. Here, we analyzed 25 available ChIP-seq data sets that identify enhancers in an unbiased manner (H3K4me1, H3K27ac, and EP300) for peaks that overlap exons. We find that, on average, 7% of all ChIP-seq peaks overlap coding exons (after excluding for peaks that overlap with first exons). By using mouse and zebrafish enhancer assays, we demonstrate that several of these exonic enhancer (eExons) candidates can function as enhancers of their neighboring genes and that the exonic sequence is necessary for enhancer activity. Using ChIP, 3C, and DNA FISH, we further show that one of these exonic limb enhancers, Dync1i1 exon 15, has active enhancer marks and physically interacts with Dlx5/6 promoter regions 900 kb away. In addition, its removal by chromosomal abnormalities in humans could cause split hand and foot malformation 1 (SHFM1), a disorder associated with DLX5/6. These results demonstrate that DNA sequences can have a dual function, operating as coding exons in one tissue and enhancers of nearby gene(s) in another tissue, suggesting that phenotypes resulting from coding mutations could be caused not only by protein alteration but also by disrupting the regulation of another gene.


Asunto(s)
Elementos de Facilitación Genéticos , Exones , Regulación de la Expresión Génica , Animales , Inmunoprecipitación de Cromatina , Aberraciones Cromosómicas , Dineínas Citoplasmáticas/genética , Extremidades/embriología , Extremidades/fisiología , Femenino , Proteínas de Homeodominio/genética , Humanos , Hibridación Fluorescente in Situ , Deformidades Congénitas de las Extremidades/genética , Masculino , Ratones , Ratones Transgénicos , Regiones Promotoras Genéticas , Pez Cebra/genética
16.
PLoS Comput Biol ; 10(1): e1003449, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24499934

RESUMEN

Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs) likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86%) tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87%) tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.


Asunto(s)
Elementos de Facilitación Genéticos , Factores de Transcripción/metabolismo , Algoritmos , Secuencias de Aminoácidos , Animales , Automatización , Sitios de Unión , Análisis por Conglomerados , Biología Computacional , Simulación por Computador , Epigenómica , Femenino , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Ratones , Placenta/fisiología , Embarazo , Trofoblastos/citología
17.
Nucleic Acids Res ; 41(15): e151, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23814184

RESUMEN

Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.


Asunto(s)
Biología Computacional/métodos , Secuencia Conservada , Filogenia , Análisis de Secuencia de ADN/métodos , Pez Cebra/genética , Animales , Secuencia de Bases , Evolución Molecular , Genómica/métodos , Internet , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos , Sensibilidad y Especificidad , Alineación de Secuencia , Sintenía , Pez Cebra/clasificación
18.
PLoS Genet ; 8(8): e1002852, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22876195

RESUMEN

The identification of homologies, whether morphological, molecular, or genetic, is fundamental to our understanding of common biological principles. Homologies bridging the great divide between deuterostomes and protostomes have served as the basis for current models of animal evolution and development. It is now appreciated that these two clades share a common developmental toolkit consisting of conserved transcription factors and signaling pathways. These patterning genes sometimes show common expression patterns and genetic interactions, suggesting the existence of similar or even conserved regulatory apparatus. However, previous studies have found no regulatory sequence conserved between deuterostomes and protostomes. Here we describe the first such enhancers, which we call bilaterian conserved regulatory elements (Bicores). Bicores show conservation of sequence and gene synteny. Sequence conservation of Bicores reflects conserved patterns of transcription factor binding sites. We predict that Bicores act as response elements to signaling pathways, and we show that Bicores are developmental enhancers that drive expression of transcriptional repressors in the vertebrate central nervous system. Although the small number of identified Bicores suggests extensive rewiring of cis-regulation between the protostome and deuterostome clades, additional Bicores may be revealed as our understanding of cis-regulatory logic and sample of bilaterian genomes continue to grow.


Asunto(s)
Elementos de Facilitación Genéticos , Genoma , Invertebrados/genética , Factores de Transcripción/genética , Vertebrados/genética , Secuencia de Aminoácidos , Animales , Sitios de Unión , Evolución Biológica , Sistema Nervioso Central/embriología , Sistema Nervioso Central/metabolismo , Secuencia Conservada , Regulación del Desarrollo de la Expresión Génica , Humanos , Invertebrados/embriología , Invertebrados/metabolismo , Datos de Secuencia Molecular , Unión Proteica , Alineación de Secuencia , Transducción de Señal , Sintenía , Factores de Transcripción/metabolismo , Vertebrados/embriología , Vertebrados/metabolismo
19.
Nat Biotechnol ; 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38168995

RESUMEN

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

20.
Nat Biotechnol ; 41(2): 232-238, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36050551

RESUMEN

Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA