Búsqueda | Portal Regional de la BVS

1.

Whole-genome Comparisons Identify Repeated Regulatory Changes Underlying Convergent Appendage Evolution in Diverse Fish Lineages.

Chen, Heidi I; Turakhia, Yatish; Bejerano, Gill; Kingsley, David M.

Mol Biol Evol ; 40(9)2023 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-37739926

RESUMEN

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 percomorph fish species that span over 100 million years of evolution and either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages. Recurrent deletions of conserved sequences in wild fin-reduced species are enriched for functions related to appendage development, suggesting that convergent fin reduction at the organismal level is associated with repeated genomic deletions near fin-appendage development genes. We used sequencing and functional enhancer assays to confirm that PelA, a Pitx1 enhancer previously linked to recurrent pelvic loss in sticklebacks, has also been independently deleted and may have contributed to the fin morphology in distantly related pelvic-reduced species. We also identify a novel enhancer that is conserved in the majority of percomorphs, drives caudal fin expression in transgenic stickleback, is missing in tetraodontiform, syngnathid, and synbranchid species with caudal fin reduction, and alters caudal fin development when targeted by genome editing. Our study illustrates a broadly applicable strategy for mapping phenotypes to genotypes across a tree of vertebrate species and highlights notable new examples of regulatory genomic hotspots that have been used to evolve recurrent phenotypes across 100 million years of fish evolution.

Asunto(s)

Peces , Smegmamorpha , Animales , Peces/genética , Genómica , Genotipo , Smegmamorpha/genética , Aletas de Animales

2.

Analysis of structural variation among inbred mouse strains.

Arslan, Ahmed; Fang, Zhuoqing; Wang, Meiyue; Tan, Yalun; Cheng, Zhuanfen; Chen, Xinyu; Guan, Yuan; J Pisani, Laura; Yoo, Boyoung; Bejerano, Gill; Peltz, Gary.

BMC Genomics ; 24(1): 97, 2023 Mar 02.

Artículo en Inglés | MEDLINE | ID: mdl-36864393

RESUMEN

BACKGROUND: 'Long read' sequencing methods have been used to identify previously uncharacterized structural variants that cause human genetic diseases. Therefore, we investigated whether long read sequencing could facilitate genetic analysis of murine models for human diseases. RESULTS: The genomes of six inbred strains (BTBR T + Itpr3tf/J, 129Sv1/J, C57BL/6/J, Balb/c/J, A/J, SJL/J) were analyzed using long read sequencing. Our results revealed that (i) Structural variants are very abundant within the genome of inbred strains (4.8 per gene) and (ii) that we cannot accurately infer whether structural variants are present using conventional short read genomic sequence data, even when nearby SNP alleles are known. The advantage of having a more complete map was demonstrated by analyzing the genomic sequence of BTBR mice. Based upon this analysis, knockin mice were generated and used to characterize a BTBR-unique 8-bp deletion within Draxin that contributes to the BTBR neuroanatomic abnormalities, which resemble human autism spectrum disorder. CONCLUSION: A more complete map of the pattern of genetic variation among inbred strains, which is produced by long read genomic sequencing of the genomes of additional inbred strains, could facilitate genetic discovery when murine models of human diseases are analyzed.

Asunto(s)

Trastorno del Espectro Autista , Humanos , Ratones , Animales , Ratones Endogámicos C57BL , Ratones Endogámicos , Mapeo Cromosómico , Alelos , Péptidos y Proteínas de Señalización Intercelular

3.

Whole-genome comparisons identify repeated regulatory changes underlying convergent appendage evolution in diverse fish lineages.

Chen, Heidi I; Turakhia, Yatish; Bejerano, Gill; Kingsley, David M.

bioRxiv ; 2023 Jan 31.

Artículo en Inglés | MEDLINE | ID: mdl-36778215

RESUMEN

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 wild fish species that either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages. Recurrent deletions of conserved sequences (CONDELs) in wild fin-reduced species are enriched for functions related to appendage development, suggesting that convergent fin reduction at the organismal level is associated with repeated genomic deletions near fin-appendage development genes. We used sequencing and functional enhancer assays to confirm that PelA , a Pitx1 enhancer previously linked to recurrent pelvic loss in sticklebacks, has also been independently deleted and may have contributed to the fin morphology in distantly related pelvic-reduced species. We also identify a novel enhancer that is conserved in the majority of percomorphs, drives caudal fin expression in transgenic stickleback, is missing in tetraodontiform, s yngnathid, and synbranchid species with caudal fin reduction, and which alters caudal fin development when targeted by genome editing. Our study illustrates a general strategy for mapping phenotypes to genotypes across a tree of vertebrate species, and highlights notable new examples of regulatory genomic hotspots that have been used to evolve recurrent phenotypes during 100 million years of fish evolution.

4.

WhichTF is functionally important in your open chromatin data?

Tanigawa, Yosuke; Dyer, Ethan S; Bejerano, Gill.

PLoS Comput Biol ; 18(8): e1010378, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-36040971

RESUMEN

We present WhichTF, a computational method to identify functionally important transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF applies an ontology-guided functional approach to compute novel enrichment by integrating accessibility measurements, high-confidence pre-computed conservation-aware TF binding sites, and putative gene-regulatory models. Comparison with prior sheer abundance-based methods reveals the unique ability of WhichTF to identify context-specific TFs with functional relevance, including NF-κB family members in lymphocytes and GATA factors in cardiac cells. To distinguish the transcriptional regulatory landscape in closely related samples, we apply differential analysis and demonstrate its utility in lymphocyte, mesoderm developmental, and disease cells. We find suggestive, under-characterized TFs, such as RUNX3 in mesoderm development and GLI1 in systemic lupus erythematosus. We also find TFs known for stress response, suggesting routine experimental caveats that warrant careful consideration. WhichTF yields biological insight into known and novel molecular mechanisms of TF-mediated transcriptional regulation in diverse contexts, including human and mouse cell types, cell fate trajectories, and disease-associated cells.

Asunto(s)

Cromatina , Factores de Transcripción , Animales , Sitios de Unión , Cromatina/genética , Regulación de la Expresión Génica , Humanos , Ratones , Unión Proteica , Factores de Transcripción/metabolismo

5.

Discovering monogenic patients with a confirmed molecular diagnosis in millions of clinical notes with MonoMiner.

Wu, David Wei; Bernstein, Jonathan A; Bejerano, Gill.

Genet Med ; 24(10): 2091-2102, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-35976265

RESUMEN

PURPOSE: Cohort building is a powerful foundation for improving clinical care, performing biomedical research, recruiting for clinical trials, and many other applications. We set out to build a cohort of all monogenic patients with a definitive causal gene diagnosis in a 3-million patient hospital system. METHODS: We define a subset (4461) of OMIM diseases that have at least 1 known monogenic causal gene. We then introduce MonoMiner, a natural language processing framework to identify molecularly confirmed monogenic patients from free-text clinical notes. RESULTS: We show that ICD-10-CM codes cover only a fraction of monogenic diseases and that even where available, ICD-10-CM codeâbased patient retrieval offers 0.14 precision. Searching by causal gene symbol offers great recall but has an even worse 0.07 precision. MonoMiner achieves 6 to 11 times higher precision (0.80), with 0.87 precision on disease diagnosis alone, tagging 4259 patients with 560 monogenic diseases and 534 causal genes, at 0.48 recall. CONCLUSION: MonoMiner enables the discovery of a large, high-precision cohort of patients with monogenic diseases with an established molecular diagnosis, empowering numerous downstream uses. Because it relies solely on clinical notes, MonoMiner is highly portable, and its approach is adaptable to other domains and languages.

Asunto(s)

Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Estudios de Cohortes , Humanos

6.

X-CAP improves pathogenicity prediction of stopgain variants.

Rastogi, Ruchir; Stenson, Peter D; Cooper, David N; Bejerano, Gill.

Genome Med ; 14(1): 81, 2022 07 29.

Artículo en Inglés | MEDLINE | ID: mdl-35906703

RESUMEN

Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP .

Asunto(s)

Exoma , Programas Informáticos , Biología Computacional/métodos , Humanos , Mutación , Mutación Missense , Virulencia

7.

Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference.

Schull, James K; Turakhia, Yatish; Hemker, James A; Dally, William J; Bejerano, Gill.

Genome Biol Evol ; 14(3)2022 03 02.

Artículo en Inglés | MEDLINE | ID: mdl-35171243

RESUMEN

We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters. Champagne also offers further evidence for Myomorpha as sister to Sciuridae and Hystricomorpha in the rodent tree. Champagne harbors distinct theoretical advantages as an automated method that produces nearly homoplasy-free character matrices on the whole-genome scale.

Asunto(s)

Genoma , Genómica , Animales , Mutación INDEL , Mamíferos , Nucleótidos , Filogenia

8.

The Effect of Population Structure on Murine Genome-Wide Association Studies.

Wang, Meiyue; Fang, Zhuoqing; Yoo, Boyoung; Bejerano, Gill; Peltz, Gary.

Front Genet ; 12: 745361, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34589118

RESUMEN

The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 - 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.

9.

InpherNet accelerates monogenic disease diagnosis using patients' candidate genes' neighbors.

Yoo, Boyoung; Birgmeier, Johannes; Bernstein, Jonathan A; Bejerano, Gill.

Genet Med ; 23(10): 1984-1992, 2021 10.

Artículo en Inglés | MEDLINE | ID: mdl-34230641

RESUMEN

PURPOSE: Roughly 70% of suspected Mendelian disease patients remain undiagnosed after genome sequencing, partly because knowledge about pathogenic genes is incomplete and constantly growing. Generating a novel pathogenic gene hypothesis from patient data can be time-consuming especially where cohort-based analysis is not available. METHODS: Each patient genome contains dozens to hundreds of candidate variants. Many sources of indirect evidence about each candidate may be considered. We introduce InpherNet, a network-based machine learning approach leveraging Monarch Initiative data to accelerate this process. RESULTS: InpherNet ranks candidate genes based on orthologs, paralogs, functional pathway members, and colocalized interaction partner gene neighbors. It can propose novel pathogenic genes and reveal known pathogenic genes whose diagnosed patient-based annotation is missing or partial. InpherNet is applied to patient cases where the causative gene is incorrectly ranked low by clinical gene-ranking methods that use only patient-derived evidence. InpherNet correctly ranks the causative gene top 1 or top 1-5 in roughly twice as many cases as seven comparable tools, including in cases where no clinical evidence for the diagnostic gene is in our knowledgebase. CONCLUSION: InpherNet improves the state of the art in considering candidate gene neighbors to accelerate monogenic diagnosis.

Asunto(s)

Enfermedades Genéticas Congénitas/diagnóstico , Bases del Conocimiento , Aprendizaje Automático , Estudios de Cohortes , Humanos

10.

Avoiding genetic racial profiling in criminal DNA profile databases.

Blindenbach, Jacob A; Jagadeesh, Karthik A; Bejerano, Gill; Wu, David J.

Nat Comput Sci ; 1(4): 272-279, 2021 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38217177

RESUMEN

DNA profiling has become an essential tool for crime solving and prevention, and CODIS (Combined DNA Index System) criminal investigation databases have flourished at the national, state and even local level. However, reports suggest that the DNA profiles of all suspects searched in these databases are often retained, which could result in racial profiling. Here, we devise an approach to both enable broad DNA profile searches and preserve exonerated citizens' privacy through a real-time privacy-preserving procedure to query CODIS databases. Using our approach, an agent can privately and efficiently query a suspect's DNA profile device in the field, learning only whether the profile matches against any database profile. More importantly, the central database learns nothing about the queried profile, and thus cannot retain it. Our approach paves the way to implement privacy-preserving DNA profile searching in CODIS databases and any CODIS-like system.

11.

Morphogenesis is transcriptionally coupled to neurogenesis during peripheral olfactory organ development.

Aguillon, Raphaël; Madelaine, Romain; Aguirrebengoa, Marion; Guturu, Harendra; Link, Sandra; Dufourcq, Pascale; Lecaudey, Virginie; Bejerano, Gill; Blader, Patrick; Batut, Julie.

Development ; 147(24)2020 12 21.

Artículo en Inglés | MEDLINE | ID: mdl-33144399

RESUMEN

Sense organs acquire their distinctive shapes concomitantly with the differentiation of sensory cells and neurons necessary for their function. Although our understanding of the mechanisms controlling morphogenesis and neurogenesis in these structures has grown, how these processes are coordinated remains largely unexplored. Neurogenesis in the zebrafish olfactory epithelium requires the bHLH proneural transcription factor Neurogenin 1 (Neurog1). To address whether Neurog1 also controls morphogenesis, we analysed the migratory behaviour of early olfactory neural progenitors in neurog1 mutant embryos. Our results indicate that the oriented movements of these progenitors are disrupted in this context. Morphogenesis is similarly affected by mutations in the chemokine receptor gene, cxcr4b, suggesting it is a potential Neurog1 target gene. We find that Neurog1 directly regulates cxcr4b through an E-box cluster located just upstream of the cxcr4b transcription start site. Our results suggest that proneural transcription factors, such as Neurog1, directly couple distinct aspects of nervous system development.

Asunto(s)

Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Morfogénesis/genética , Proteínas del Tejido Nervioso/genética , Neurogénesis/genética , Mucosa Olfatoria/crecimiento & desarrollo , Receptores CXCR4/genética , Proteínas de Pez Cebra/genética , Animales , Elementos E-Box/genética , Embrión no Mamífero , Desarrollo Embrionario/genética , Regulación del Desarrollo de la Expresión Génica/genética , Mutación/genética , Neuronas/metabolismo , Sitio de Iniciación de la Transcripción , Pez Cebra/genética , Pez Cebra/crecimiento & desarrollo

12.

Transcription factor expression defines subclasses of developing projection neurons highly similar to single-cell RNA-seq subtypes.

Heavner, Whitney E; Ji, Shaoyi; Notwell, James H; Dyer, Ethan S; Tseng, Alex M; Birgmeier, Johannes; Yoo, Boyoung; Bejerano, Gill; McConnell, Susan K.

Proc Natl Acad Sci U S A ; 117(40): 25074-25084, 2020 10 06.

Artículo en Inglés | MEDLINE | ID: mdl-32948690

RESUMEN

We are only just beginning to catalog the vast diversity of cell types in the cerebral cortex. Such categorization is a first step toward understanding how diversification relates to function. All cortical projection neurons arise from a uniform pool of progenitor cells that lines the ventricles of the forebrain. It is still unclear how these progenitor cells generate the more than 50 unique types of mature cortical projection neurons defined by their distinct gene-expression profiles. Moreover, exactly how and when neurons diversify their function during development is unknown. Here we relate gene expression and chromatin accessibility of two subclasses of projection neurons with divergent morphological and functional features as they develop in the mouse brain between embryonic day 13 and postnatal day 5 in order to identify transcriptional networks that diversify neuron cell fate. We compare these gene-expression profiles with published profiles of single cells isolated from similar populations and establish that layer-defined cell classes encompass cell subtypes and developmental trajectories identified using single-cell sequencing. Given the depth of our sequencing, we identify groups of transcription factors with particularly dense subclass-specific regulation and subclass-enriched transcription factor binding motifs. We also describe transcription factor-adjacent long noncoding RNAs that define each subclass and validate the function of Myt1l in balancing the ratio of the two subclasses in vitro. Our multidimensional approach supports an evolving model of progressive restriction of cell fate competence through inherited transcriptional identities.

Asunto(s)

Proteínas del Tejido Nervioso/genética , Neuronas/metabolismo , Análisis de la Célula Individual , Factores de Transcripción/genética , Animales , Diferenciación Celular/genética , Corteza Cerebral/metabolismo , Regulación del Desarrollo de la Expresión Génica/genética , Ratones , RNA-Seq/métodos

13.

A fully-automated method discovers loss of mouse-lethal and human-monogenic disease genes in 58 mammals.

Turakhia, Yatish; Chen, Heidi I; Marcovitz, Amir; Bejerano, Gill.

Nucleic Acids Res ; 48(16): e91, 2020 09 18.

Artículo en Inglés | MEDLINE | ID: mdl-32614390

RESUMEN

Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.

Asunto(s)

Enfermedades Genéticas Congénitas/genética , Genómica/métodos , Filogenia , Algoritmos , Animales , Automatización , Mapeo Cromosómico/métodos , Genes Letales , Humanos , Mamíferos/genética , Ratones , Anotación de Secuencia Molecular

14.

AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.

Birgmeier, Johannes; Haeussler, Maximilian; Deisseroth, Cole A; Steinberg, Ethan H; Jagadeesh, Karthik A; Ratner, Alexander J; Guturu, Harendra; Wenger, Aaron M; Diekhans, Mark E; Stenson, Peter D; Cooper, David N; Ré, Christopher; Beggs, Alan H; Bernstein, Jonathan A; Bejerano, Gill.

Sci Transl Med ; 12(544)2020 05 20.

Artículo en Inglés | MEDLINE | ID: mdl-32434849

RESUMEN

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.

Asunto(s)

Exoma , Niño , Genotipo , Humanos , Fenotipo , Probabilidad , Estudios Retrospectivos

15.

AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature.

Birgmeier, Johannes; Deisseroth, Cole A; Hayward, Laura E; Galhardo, Luisa M T; Tierno, Andrew P; Jagadeesh, Karthik A; Stenson, Peter D; Cooper, David N; Bernstein, Jonathan A; Haeussler, Maximilian; Bejerano, Gill.

Genet Med ; 22(2): 362-370, 2020 02.

Artículo en Inglés | MEDLINE | ID: mdl-31467448

RESUMEN

PURPOSE: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar's 21, versus only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.

Asunto(s)

Procesamiento Automatizado de Datos/métodos , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Manejo de Datos/métodos , Bases de Datos Factuales , Bases de Datos Genéticas , Humanos , Procesamiento de Lenguaje Natural , PubMed , Publicaciones

16.

A functional enrichment test for molecular convergent evolution finds a clear protein-coding signal in echolocating bats and whales.

Marcovitz, Amir; Turakhia, Yatish; Chen, Heidi I; Gloudemans, Michael; Braun, Benjamin A; Wang, Haoqing; Bejerano, Gill.

Proc Natl Acad Sci U S A ; 116(42): 21094-21103, 2019 10 15.

Artículo en Inglés | MEDLINE | ID: mdl-31570615

RESUMEN

Distantly related species entering similar biological niches often adapt by evolving similar morphological and physiological characters. How much genomic molecular convergence (particularly of highly constrained coding sequence) contributes to convergent phenotypic evolution, such as echolocation in bats and whales, is a long-standing fundamental question. Like others, we find that convergent amino acid substitutions are not more abundant in echolocating mammals compared to their outgroups. However, we also ask a more informative question about the genomic distribution of convergent substitutions by devising a test to determine which, if any, of more than 4,000 tissue-affecting gene sets is most statistically enriched with convergent substitutions. We find that the gene set most overrepresented (q-value = 2.2e-3) with convergent substitutions in echolocators, affecting 18 genes, regulates development of the cochlear ganglion, a structure with empirically supported relevance to echolocation. Conversely, when comparing to nonecholocating outgroups, no significant gene set enrichment exists. For aquatic and high-altitude mammals, our analysis highlights 15 and 16 genes from the gene sets most affected by molecular convergence which regulate skin and lung physiology, respectively. Importantly, our test requires that the most convergence-enriched set cannot also be enriched for divergent substitutions, such as in the pattern produced by inactivated vision genes in subterranean mammals. Showing a clear role for adaptive protein-coding molecular convergence, we discover nearly 2,600 convergent positions, highlight 77 of them in 3 organs, and provide code to investigate other clades across the tree of life.

Asunto(s)

Quirópteros/genética , Quirópteros/fisiología , Ecolocación/fisiología , Proteínas/genética , Ballenas/genética , Ballenas/fisiología , Adaptación Fisiológica/genética , Adaptación Fisiológica/fisiología , Sustitución de Aminoácidos/genética , Animales , Evolución Molecular , Genoma/genética , Genómica/métodos , Audición/genética , Audición/fisiología , Filogenia , Selección Genética/genética

17.

Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology.

Tanigawa, Yosuke; Li, Jiehan; Justesen, Johanne M; Horn, Heiko; Aguirre, Matthew; DeBoever, Christopher; Chang, Chris; Narasimhan, Balasubramanian; Lage, Kasper; Hastie, Trevor; Park, Chong Y; Bejerano, Gill; Ingelsson, Erik; Rivas, Manuel A.

Nat Commun ; 10(1): 4064, 2019 09 06.

Artículo en Inglés | MEDLINE | ID: mdl-31492854

RESUMEN

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.

Asunto(s)

Adipocitos/metabolismo , Bancos de Muestras Biológicas , Estudios de Asociación Genética/métodos , Estudio de Asociación del Genoma Completo/métodos , Células 3T3-L1 , Adipocitos/citología , Animales , Células Cultivadas , Fosfodiesterasas de Nucleótidos Cíclicos Tipo 3/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Ratones , Obesidad/genética , Fenotipo , Polimorfismo de Nucleótido Simple , Reino Unido

18.

Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts.

Frésard, Laure; Smail, Craig; Ferraro, Nicole M; Teran, Nicole A; Li, Xin; Smith, Kevin S; Bonner, Devon; Kernohan, Kristin D; Marwaha, Shruti; Zappala, Zachary; Balliu, Brunilda; Davis, Joe R; Liu, Boxiang; Prybol, Cameron J; Kohler, Jennefer N; Zastrow, Diane B; Reuter, Chloe M; Fisk, Dianna G; Grove, Megan E; Davidson, Jean M; Hartley, Taila; Joshi, Ruchi; Strober, Benjamin J; Utiramerur, Sowmithri; Lind, Lars; Ingelsson, Erik; Battle, Alexis; Bejerano, Gill; Bernstein, Jonathan A; Ashley, Euan A; Boycott, Kym M; Merker, Jason D; Wheeler, Matthew T; Montgomery, Stephen B.

Nat Med ; 25(6): 911-919, 2019 06.

Artículo en Inglés | MEDLINE | ID: mdl-31160820

RESUMEN

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution.

Asunto(s)

Enfermedades Raras/genética , Ceramidasa Ácida/genética , Estudios de Casos y Controles , Niño , Preescolar , Estudios de Cohortes , Femenino , Variación Genética , Humanos , Masculino , Modelos Genéticos , Mutación , Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH/genética , Canales de Potasio/genética , ARN/sangre , ARN/genética , Empalme del ARN/genética , Enfermedades Raras/sangre , Análisis de Secuencia de ARN , Secuenciación del Exoma

19.

CRISPR/Cas9 Genome Engineering in Engraftable Human Brain-Derived Neural Stem Cells.

Dever, Daniel P; Scharenberg, Samantha G; Camarena, Joab; Kildebeck, Eric J; Clark, Joseph T; Martin, Renata M; Bak, Rasmus O; Tang, Yuming; Dohse, Monika; Birgmeier, Johannes A; Jagadeesh, Karthik A; Bejerano, Gill; Tsukamoto, Ann; Gomez-Ospina, Natalia; Uchida, Nobuko; Porteus, Matthew H.

iScience ; 15: 524-535, 2019 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-31132746

RESUMEN

Human neural stem cells (NSCs) offer therapeutic potential for neurodegenerative diseases, such as inherited monogenic nervous system disorders, and neural injuries. Gene editing in NSCs (GE-NSCs) could enhance their therapeutic potential. We show that NSCs are amenable to gene targeting at multiple loci using Cas9 mRNA with synthetic chemically modified guide RNAs along with DNA donor templates. Transplantation of GE-NSC into oligodendrocyte mutant shiverer-immunodeficient mice showed that GE-NSCs migrate and differentiate into astrocytes, neurons, and myelin-producing oligodendrocytes, highlighting the fact that GE-NSCs retain their NSC characteristics of self-renewal and site-specific global migration and differentiation. To show the therapeutic potential of GE-NSCs, we generated GALC lysosomal enzyme overexpressing GE-NSCs that are able to cross-correct GALC enzyme activity through the mannose-6-phosphate receptor pathway. These GE-NSCs have the potential to be an investigational cell and gene therapy for a range of neurodegenerative disorders and injuries of the central nervous system, including lysosomal storage disorders.

20.

S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing.

Jagadeesh, Karthik A; Paggi, Joseph M; Ye, James S; Stenson, Peter D; Cooper, David N; Bernstein, Jonathan A; Bejerano, Gill.

Nat Genet ; 51(4): 755-763, 2019 04.

Artículo en Inglés | MEDLINE | ID: mdl-30804562

RESUMEN

Exome analysis of patients with a likely monogenic disease does not identify a causal variant in over half of cases. Splice-disrupting mutations make up the second largest class of known disease-causing mutations. Each individual (singleton) exome harbors over 500 rare variants of unknown significance (VUS) in the splicing region. The existing relevant pathogenicity prediction tools tackle all non-coding variants as one amorphic class and/or are not calibrated for the high sensitivity required for clinical use. Here we calibrate seven such tools and devise a novel tool called Splicing Clinically Applicable Pathogenicity prediction (S-CAP) that is over twice as powerful as all previous tools, removing 41% of patient VUS at 95% sensitivity. We show that S-CAP does this by using its own features and not via meta-prediction over previous tools, and that splicing pathogenicity prediction is distinct from predicting molecular splicing changes. S-CAP is an important step on the path to deriving non-coding causal diagnoses.

Asunto(s)

Variación Genética/genética , Empalme del ARN/genética , Exoma/genética , Humanos , Mutación/genética

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA