Search | VHL Search Portal

1.

LRRC37B is a human modifier of voltage-gated sodium channels and axon excitability in cortical neurons.

Libé-Philippot, Baptiste; Lejeune, Amélie; Wierda, Keimpe; Louros, Nikolaos; Erkol, Emir; Vlaeminck, Ine; Beckers, Sofie; Gaspariunaite, Vaiva; Bilheu, Angéline; Konstantoulea, Katerina; Nyitrai, Hajnalka; De Vleeschouwer, Matthias; Vennekens, Kristel M; Vidal, Niels; Bird, Thomas W; Soto, Daniela C; Jaspers, Tom; Dewilde, Maarten; Dennis, Megan Y; Rousseau, Frederic; Comoletti, Davide; Schymkowitz, Joost; Theys, Tom; de Wit, Joris; Vanderhaeghen, Pierre.

Cell ; 186(26): 5766-5783.e25, 2023 12 21.

Article in English | MEDLINE | ID: mdl-38134874

ABSTRACT

The enhanced cognitive abilities characterizing the human species result from specialized features of neurons and circuits. Here, we report that the hominid-specific gene LRRC37B encodes a receptor expressed in human cortical pyramidal neurons (CPNs) and selectively localized to the axon initial segment (AIS), the subcellular compartment triggering action potentials. Ectopic expression of LRRC37B in mouse CPNs in vivo leads to reduced intrinsic excitability, a distinctive feature of some classes of human CPNs. Molecularly, LRRC37B binds to the secreted ligand FGF13A and to the voltage-gated sodium channel (Nav) ß-subunit SCN1B. LRRC37B concentrates inhibitory effects of FGF13A on Nav channel function, thereby reducing excitability, specifically at the AIS level. Electrophysiological recordings in adult human cortical slices reveal lower neuronal excitability in human CPNs expressing LRRC37B. LRRC37B thus acts as a species-specific modifier of human neuron excitability, linking human genome and cell evolution, with important implications for human brain function and diseases.

Subject(s)

Neurons , Pyramidal Cells , Voltage-Gated Sodium Channels , Animals , Humans , Mice , Action Potentials/physiology , Axons/metabolism , Neurons/metabolism , Voltage-Gated Sodium Channels/genetics , Voltage-Gated Sodium Channels/metabolism

2.

Telomere-to-telomere assembly of a complete human X chromosome.

Miga, Karen H; Koren, Sergey; Rhie, Arang; Vollger, Mitchell R; Gershman, Ariel; Bzikadze, Andrey; Brooks, Shelise; Howe, Edmund; Porubsky, David; Logsdon, Glennis A; Schneider, Valerie A; Potapova, Tamara; Wood, Jonathan; Chow, William; Armstrong, Joel; Fredrickson, Jeanne; Pak, Evgenia; Tigyi, Kristof; Kremitzki, Milinn; Markovic, Christopher; Maduro, Valerie; Dutra, Amalia; Bouffard, Gerard G; Chang, Alexander M; Hansen, Nancy F; Wilfert, Amy B; Thibaud-Nissen, Françoise; Schmitt, Anthony D; Belton, Jon-Matthew; Selvaraj, Siddarth; Dennis, Megan Y; Soto, Daniela C; Sahasrabudhe, Ruta; Kaya, Gulhan; Quick, Josh; Loman, Nicholas J; Holmes, Nadine; Loose, Matthew; Surti, Urvashi; Risques, Rosa Ana; Graves Lindsay, Tina A; Fulton, Robert; Hall, Ira; Paten, Benedict; Howe, Kerstin; Timp, Winston; Young, Alice; Mullikin, James C; Pevzner, Pavel A; Gerton, Jennifer L.

Nature ; 585(7823): 79-84, 2020 09.

Article in English | MEDLINE | ID: mdl-32663838

ABSTRACT

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.

Subject(s)

Chromosomes, Human, X/genetics , Genome, Human/genetics , Telomere/genetics , Centromere/genetics , CpG Islands/genetics , DNA Methylation , DNA, Satellite/genetics , Female , Humans , Hydatidiform Mole/genetics , Male , Pregnancy , Reproducibility of Results , Testis/metabolism

3.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Mc Cartney, Ann M; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A; Miga, Karen H; Mikheenko, Alla; Paten, Benedict; Shumate, Alaina; Soto, Daniela C; Sovic, Ivan; Wood, Jonathan M D; Zook, Justin M; Phillippy, Adam M; Rhie, Arang.

Nat Methods ; 19(6): 687-695, 2022 06.

Article in English | MEDLINE | ID: mdl-35361931

ABSTRACT

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Subject(s)

High-Throughput Nucleotide Sequencing , Nanopores , Female , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Humans , Pregnancy , Sequence Analysis, DNA/methods , Telomere/genetics

4.

Plant ecological genomics at the limits of life in the Atacama Desert.

Eshel, Gil; Araus, Viviana; Undurraga, Soledad; Soto, Daniela C; Moraga, Carol; Montecinos, Alejandro; Moyano, Tomás; Maldonado, Jonathan; Díaz, Francisca P; Varala, Kranthi; Nelson, Chase W; Contreras-López, Orlando; Pal-Gabor, Henrietta; Kraiser, Tatiana; Carrasco-Puga, Gabriela; Nilo-Poyanco, Ricardo; Zegar, Charles M; Orellana, Ariel; Montecino, Martín; Maass, Alejandro; Allende, Miguel L; DeSalle, Robert; Stevenson, Dennis W; González, Mauricio; Latorre, Claudio; Coruzzi, Gloria M; Gutiérrez, Rodrigo A.

Proc Natl Acad Sci U S A ; 118(46)2021 11 16.

Article in English | MEDLINE | ID: mdl-34725254

ABSTRACT

The Atacama Desert in Chile-hyperarid and with high-ultraviolet irradiance levels-is one of the harshest environments on Earth. Yet, dozens of species grow there, including Atacama-endemic plants. Herein, we establish the Talabre-Lejía transect (TLT) in the Atacama as an unparalleled natural laboratory to study plant adaptation to extreme environmental conditions. We characterized climate, soil, plant, and soil-microbe diversity at 22 sites (every 100 m of altitude) along the TLT over a 10-y period. We quantified drought, nutrient deficiencies, large diurnal temperature oscillations, and pH gradients that define three distinct vegetational belts along the altitudinal cline. We deep-sequenced transcriptomes of 32 dominant plant species spanning the major plant clades, and assessed soil microbes by metabarcoding sequencing. The top-expressed genes in the 32 Atacama species are enriched in stress responses, metabolism, and energy production. Moreover, their root-associated soils are enriched in growth-promoting bacteria, including nitrogen fixers. To identify genes associated with plant adaptation to harsh environments, we compared 32 Atacama species with the 32 closest sequenced species, comprising 70 taxa and 1,686,950 proteins. To perform phylogenomic reconstruction, we concatenated 15,972 ortholog groups into a supermatrix of 8,599,764 amino acids. Using two codon-based methods, we identified 265 candidate positively selected genes (PSGs) in the Atacama plants, 64% of which are located in Pfam domains, supporting their functional relevance. For 59/184 PSGs with an Arabidopsis ortholog, we uncovered functional evidence linking them to plant resilience. As some Atacama plants are closely related to staple crops, these candidate PSGs are a "genetic goldmine" to engineer crop resilience to face climate change.

Subject(s)

Plants/genetics , Altitude , Chile , Climate Change , Desert Climate , Ecosystem , Genomics/methods , Phylogeny , Soil , Soil Microbiology

5.

Diverse Molecular Mechanisms Contribute to Differential Expression of Human Duplicated Genes.

Shew, Colin J; Carmona-Mora, Paulina; Soto, Daniela C; Mastoras, Mira; Roberts, Elizabeth; Rosas, Joseph; Jagannathan, Dhriti; Kaya, Gulhan; O'Geen, Henriette; Dennis, Megan Y.

Mol Biol Evol ; 38(8): 3060-3077, 2021 07 29.

Article in English | MEDLINE | ID: mdl-34009325

ABSTRACT

Emerging evidence links genes within human-specific segmental duplications (HSDs) to traits and diseases unique to our species. Strikingly, despite being nearly identical by sequence (>98.5%), paralogous HSD genes are differentially expressed across human cell and tissue types, though the underlying mechanisms have not been examined. We compared cross-tissue mRNA levels of 75 HSD genes from 30 families between humans and chimpanzees and found expression patterns consistent with relaxed selection on or neofunctionalization of derived paralogs. In general, ancestral paralogs exhibited greatest expression conservation with chimpanzee orthologs, though exceptions suggest certain derived paralogs may retain or supplant ancestral functions. Concordantly, analysis of long-read isoform sequencing data sets from diverse human tissues and cell lines found that about half of derived paralogs exhibited globally lower expression. To understand mechanisms underlying these differences, we leveraged data from human lymphoblastoid cell lines (LCLs) and found no relationship between paralogous expression divergence and post-transcriptional regulation, sequence divergence, or copy-number variation. Considering cis-regulation, we reanalyzed ENCODE data and recovered hundreds of previously unidentified candidate CREs in HSDs. We also generated large-insert ChIP-sequencing data for active chromatin features in an LCL to better distinguish paralogous regions. Some duplicated CREs were sufficient to drive differential reporter activity, suggesting they may contribute to divergent cis-regulation of paralogous genes. This work provides evidence that cis-regulatory divergence contributes to novel expression patterns of recent gene duplicates in humans.

Subject(s)

Gene Duplication , Gene Expression Regulation , Genome, Human , Segmental Duplications, Genomic , Animals , Cell Line , DNA Copy Number Variations , Humans , Pan troglodytes , Promoter Regions, Genetic

6.

Multiscale climate change impacts on plant diversity in the Atacama Desert.

Díaz, Francisca P; Latorre, Claudio; Carrasco-Puga, Gabriela; Wood, Jamie R; Wilmshurst, Janet M; Soto, Daniela C; Cole, Theresa L; Gutiérrez, Rodrigo A.

Glob Chang Biol ; 25(5): 1733-1745, 2019 05.

Article in English | MEDLINE | ID: mdl-30706600

ABSTRACT

Comprehending ecological dynamics requires not only knowledge of modern communities but also detailed reconstructions of ecosystem history. Ancient DNA (aDNA) metabarcoding allows biodiversity responses to major climatic change to be explored at different spatial and temporal scales. We extracted aDNA preserved in fossil rodent middens to reconstruct late Quaternary vegetation dynamics in the hyperarid Atacama Desert. By comparing our paleo-informed millennial record with contemporary observations of interannual variations in diversity, we show local plant communities behave differentially at different timescales. In the interannual (years to decades) time frame, only annual herbaceous expand and contract their distributional ranges (emerging from persistent seed banks) in response to precipitation, whereas perennials distribution appears to be extraordinarily resilient. In contrast, at longer timescales (thousands of years) many perennial species were displaced up to 1,000 m downslope during pluvial events. Given ongoing and future natural and anthropogenically induced climate change, our results not only provide baselines for vegetation in the Atacama Desert, but also help to inform how these and other high mountain plant communities may respond to fluctuations of climate in the future.

Subject(s)

Biodiversity , Climate Change , Desert Climate , Plants , Chile , DNA, Ancient/analysis , Ecosystem , Fossils , Plant Dispersal , Plants/classification , Plants/genetics , Population Dynamics

7.

Complementation testing identifies genes mediating effects at quantitative trait loci underlying fear-related behavior.

Chen, Patrick B; Chen, Rachel; LaPierre, Nathan; Chen, Zeyuan; Mefford, Joel; Marcus, Emilie; Heffel, Matthew G; Soto, Daniela C; Ernst, Jason; Luo, Chongyuan; Flint, Jonathan.

Cell Genom ; 4(5): 100545, 2024 May 08.

Article in English | MEDLINE | ID: mdl-38697120

ABSTRACT

Knowing the genes involved in quantitative traits provides an entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six quantitative trait loci (QTLs) by quantitative complementation, and identified six genes. Four genes, Lamp, Ptprd, Nptx2, and Sh3gl, have known roles in synapse function; the fifth, Psip1, was not previously implicated in behavior; and the sixth is a long non-coding RNA, 4933413L06Rik, of unknown function. Variation in transcriptome and epigenetic modalities occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results relieve a bottleneck in using genetic mapping of QTLs to uncover biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.

Subject(s)

Fear , Quantitative Trait Loci , Animals , Female , Male , Mice , Behavior, Animal/physiology , Chromosome Mapping , Fear/physiology , Mice, Inbred C57BL , Genetic Complementation Test

8.

Complementation testing identifies causal genes at quantitative trait loci underlying fear related behavior.

Chen, Patrick B; Chen, Rachel; LaPierre, Nathan; Chen, Zeyuan; Mefford, Joel; Marcus, Emilie; Heffel, Matthew G; Soto, Daniela C; Ernst, Jason; Luo, Chongyuan; Flint, Jonathan.

bioRxiv ; 2024 Jan 04.

Article in English | MEDLINE | ID: mdl-38260483

ABSTRACT

Knowing the genes involved in quantitative traits provides a critical entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. Here we address a key step towards that goal by deploying a test that directly queries whether a gene mediates the effect of a quantitative trait locus (QTL). To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six QTLs, and identified six genes. Four genes, Lsamp, Ptprd, Nptx2 and Sh3gl, have known roles in synapse function; the fifth gene, Psip1, is a transcriptional co-activator not previously implicated in behavior; the sixth is a long non-coding RNA 4933413L06Rik with no known function. Single nucleus transcriptomic and epigenetic analyses implicated excitatory neurons as likely mediating the genetic effects. Surprisingly, variation in transcriptome and epigenetic modalities between inbred strains occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results open a bottleneck in using genetic mapping of QTLs to find novel biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.

9.

Integration of CTCF Loops, Methylome, and Transcriptome in Differentiating LUHMES as a Model for Imprinting Dynamics of the 15q11-q13 Locus in Human Neurons.

Fugon, Orangel J Gutierrez; Sharifi, Osman; Heath, Nicholas G; Soto, Daniela C; Gomez, J Antonio; Yasui, Dag H; Mendiola, Aron Judd P; O'Geen, Henriette; Beitnere, Ulrika; Tomkova, Marketa; Haghani, Viktoria; Dillon, Greg; Segal, David J; LaSalle, Janine.

bioRxiv ; 2024 Mar 29.

Article in English | MEDLINE | ID: mdl-38586056

ABSTRACT

Human cell line models, including the neuronal precursor line LUHMES, are important for investigating developmental transcriptional dynamics within imprinted regions, particularly the 15q11-q13 Angelman (AS) and Prader-Willi (PWS) syndrome locus. AS results from loss of maternal UBE3A in neurons, where the paternal allele is silenced by a convergent antisense transcript UBE3A-ATS, a lncRNA that normally terminates at PWAR1 in non-neurons. qRTPCR analysis confirmed the exclusive and progressive increase in UBE3A-ATS in differentiating LUHMES neurons, validating their use for studying UBE3A silencing. Genome-wide transcriptome analyses revealed changes to 11,834 genes during neuronal differentiation, including the upregulation of most genes within the 15q11-q13 locus. To identify dynamic changes in chromatin loops linked to transcriptional activity, we performed a HiChIP validated by 4C, which identified two neuron-specific CTCF loops between MAGEL2-SNRPN and PWAR1-UBE3A. To determine if allele-specific differentially methylated regions (DMR) may be associated with CTCF loop anchors, whole genome long-read nanopore sequencing was performed. We identified a paternally hypomethylated DMR near the SNRPN upstream loop anchor exclusive to neurons and a paternally hypermethylated DMR near the PWAR1 CTCF anchor exclusive to undifferentiated cells, consistent with increases in neuronal transcription. Additionally, DMRs near CTCF loop anchors were observed in both cell types, indicative of allele-specific differences in chromatin loops regulating imprinted transcription. These results provide an integrated view of the 15q11-q13 epigenetic landscape during LUHMES neuronal differentiation, underscoring the complex interplay of transcription, chromatin looping, and DNA methylation. They also provide insights for future therapeutic approaches for AS and PWS.

10.

Genomic structural variation: A complex but important driver of human evolution.

Soto, Daniela C; Uribe-Salazar, José M; Shew, Colin J; Sekar, Aarthi; McGinty, Sean P; Dennis, Megan Y.

Am J Biol Anthropol ; 181 Suppl 76: 118-144, 2023 08.

Article in English | MEDLINE | ID: mdl-36794631

ABSTRACT

Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.

Subject(s)

Genomic Structural Variation , Hominidae , Animals , Humans , Genome , Genomics , Hominidae/genetics , Primates/genetics , Nucleotides

11.

FixItFelix: improving genomic analysis by fixing reference errors.

Behera, Sairam; LeFaive, Jonathon; Orchard, Peter; Mahmoud, Medhat; Paulin, Luis F; Farek, Jesse; Soto, Daniela C; Parker, Stephen C J; Smith, Albert V; Dennis, Megan Y; Zook, Justin M; Sedlazeck, Fritz J.

Genome Biol ; 24(1): 31, 2023 02 21.

Article in English | MEDLINE | ID: mdl-36810122

ABSTRACT

The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.

Subject(s)

Genome, Human , Genomics , Humans , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA

12.

Placental methylome reveals a 22q13.33 brain regulatory gene locus associated with autism.

Zhu, Yihui; Gomez, J Antonio; Laufer, Benjamin I; Mordaunt, Charles E; Mouat, Julia S; Soto, Daniela C; Dennis, Megan Y; Benke, Kelly S; Bakulski, Kelly M; Dou, John; Marathe, Ria; Jianu, Julia M; Williams, Logan A; Gutierrez Fugón, Orangel J; Walker, Cheryl K; Ozonoff, Sally; Daniels, Jason; Grosvenor, Luke P; Volk, Heather E; Feinberg, Jason I; Fallin, M Daniele; Hertz-Picciotto, Irva; Schmidt, Rebecca J; Yasui, Dag H; LaSalle, Janine M.

Genome Biol ; 23(1): 46, 2022 02 16.

Article in English | MEDLINE | ID: mdl-35168652

ABSTRACT

BACKGROUND: Autism spectrum disorder (ASD) involves complex genetics interacting with the perinatal environment, complicating the discovery of common genetic risk. The epigenetic layer of DNA methylation shows dynamic developmental changes and molecular memory of in utero experiences, particularly in placenta, a fetal tissue discarded at birth. However, current array-based methods to identify novel ASD risk genes lack coverage of the most structurally and epigenetically variable regions of the human genome. RESULTS: We use whole genome bisulfite sequencing in placenta samples from prospective ASD studies to discover a previously uncharacterized ASD risk gene, LOC105373085, renamed NHIP. Out of 134 differentially methylated regions associated with ASD in placental samples, a cluster at 22q13.33 corresponds to a 118-kb hypomethylated block that replicates in two additional cohorts. Within this locus, NHIP is functionally characterized as a nuclear peptide-encoding transcript with high expression in brain, and increased expression following neuronal differentiation or hypoxia, but decreased expression in ASD placenta and brain. NHIP overexpression increases cellular proliferation and alters expression of genes regulating synapses and neurogenesis, overlapping significantly with known ASD risk genes and NHIP-associated genes in ASD brain. A common structural variant disrupting the proximity of NHIP to a fetal brain enhancer is associated with NHIP expression and methylation levels and ASD risk, demonstrating a common genetic influence. CONCLUSIONS: Together, these results identify and initially characterize a novel environmentally responsive ASD risk gene relevant to brain development in a hitherto under-characterized region of the human genome.

Subject(s)

Autism Spectrum Disorder , Autistic Disorder , Autism Spectrum Disorder/genetics , Autistic Disorder/complications , Autistic Disorder/genetics , Autistic Disorder/metabolism , Brain/metabolism , DNA Methylation , Epigenesis, Genetic , Epigenome , Female , Genes, Regulator , Humans , Infant, Newborn , Placenta/metabolism , Pregnancy , Prospective Studies

13.

A complete reference genome improves analysis of human genetic variation.

Aganezov, Sergey; Yan, Stephanie M; Soto, Daniela C; Kirsche, Melanie; Zarate, Samantha; Avdeyev, Pavel; Taylor, Dylan J; Shafin, Kishwar; Shumate, Alaina; Xiao, Chunlin; Wagner, Justin; McDaniel, Jennifer; Olson, Nathan D; Sauria, Michael E G; Vollger, Mitchell R; Rhie, Arang; Meredith, Melissa; Martin, Skylar; Lee, Joyce; Koren, Sergey; Rosenfeld, Jeffrey A; Paten, Benedict; Layer, Ryan; Chin, Chen-Shan; Sedlazeck, Fritz J; Hansen, Nancy F; Miller, Danny E; Phillippy, Adam M; Miga, Karen H; McCoy, Rajiv C; Dennis, Megan Y; Zook, Justin M; Schatz, Michael C.

Science ; 376(6588): eabl3533, 2022 04.

Article in English | MEDLINE | ID: mdl-35357935

ABSTRACT

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.

Subject(s)

Genetic Variation , Genome, Human , Genomics/standards , Sequence Analysis, DNA/standards , Humans , Reference Standards

14.

The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms.

Walker, Kimberly; Kalra, Divya; Lowdon, Rebecca; Chen, Guangyi; Molik, David; Soto, Daniela C; Dabbaghie, Fawaz; Khleifat, Ahmad Al; Mahmoud, Medhat; Paulin, Luis F; Raza, Muhammad Sohail; Pfeifer, Susanne P; Agustinho, Daniel Paiva; Aliyev, Elbay; Avdeyev, Pavel; Barrozo, Enrico R; Behera, Sairam; Billingsley, Kimberley; Chong, Li Chuin; Choubey, Deepak; De Coster, Wouter; Fu, Yilei; Gener, Alejandro R; Hefferon, Timothy; Henke, David Morgan; Höps, Wolfram; Illarionova, Anastasia; Jochum, Michael D; Jose, Maria; Kesharwani, Rupesh K; Kolora, Sree Rohit Raj; Kubica, Jedrzej; Lakra, Priya; Lattimer, Damaris; Liew, Chia-Sin; Lo, Bai-Wei; Lo, Chunhsuan; Lötter, Anneri; Majidian, Sina; Mendem, Suresh Kumar; Mondal, Rajarshi; Ohmiya, Hiroko; Parvin, Nasrin; Peralta, Carolina; Poon, Chi-Lam; Prabhakaran, Ramanandan; Saitou, Marie; Sammi, Aditi; Sanio, Philippe; Sapoval, Nicolae.

F1000Res ; 11: 530, 2022.

Article in English | MEDLINE | ID: mdl-36262335

ABSTRACT

In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Genomics , Software

15.

Identification of Structural Variation in Chimpanzees Using Optical Mapping and Nanopore Sequencing.

Soto, Daniela C; Shew, Colin; Mastoras, Mira; Schmidt, Joshua M; Sahasrabudhe, Ruta; Kaya, Gulhan; Andrés, Aida M; Dennis, Megan Y.

Genes (Basel) ; 11(3)2020 03 04.

Article in English | MEDLINE | ID: mdl-32143403

ABSTRACT

Recent efforts to comprehensively characterize great ape genetic diversity using short-read sequencing and single-nucleotide variants have led to important discoveries related to selection within species, demographic history, and lineage-specific traits. Structural variants (SVs), including deletions and inversions, comprise a larger proportion of genetic differences between and within species, making them an important yet understudied source of trait divergence. Here, we used a combination of long-read and -range sequencing approaches to characterize the structural variant landscape of two additional Pan troglodytes verus individuals, one of whom carries 13% admixture from Pan troglodytes troglodytes. We performed optical mapping of both individuals followed by nanopore sequencing of one individual. Filtering for larger variants (>10 kbp) and combined with genotyping of SVs using short-read data from the Great Ape Genome Project, we identified 425 deletions and 59 inversions, of which 88 and 36, respectively, were novel. Compared with gene expression in humans, we found a significant enrichment of chimpanzee genes with differential expression in lymphoblastoid cell lines and induced pluripotent stem cells, both within deletions and near inversion breakpoints. We examined chromatin-conformation maps from human and chimpanzee using these same cell types and observed alterations in genomic interactions at SV breakpoints. Finally, we focused on 56 genes impacted by SVs in >90% of chimpanzees and absent in humans and gorillas, which may contribute to chimpanzee-specific features. Sequencing a greater set of individuals from diverse subspecies will be critical to establish the complete landscape of genetic variation in chimpanzees.

Subject(s)

Genome/genetics , Genomic Structural Variation/genetics , Hominidae/genetics , Pan troglodytes/genetics , Animals , Chromosome Inversion/genetics , Genomics , Gorilla gorilla/genetics , Humans , Nanopore Sequencing , Restriction Mapping , Sequence Analysis, DNA

16.

Whole Genome Sequence, Variant Discovery and Annotation in Mapuche-Huilliche Native South Americans.

Vidal, Elena A; Moyano, Tomás C; Bustos, Bernabé I; Pérez-Palma, Eduardo; Moraga, Carol; Riveras, Eleodoro; Montecinos, Alejandro; Azócar, Lorena; Soto, Daniela C; Vidal, Mabel; Di Genova, Alex; Puschel, Klaus; Nürnberg, Peter; Buch, Stephan; Hampe, Jochen; Allende, Miguel L; Cambiazo, Verónica; González, Mauricio; Hodar, Christian; Montecino, Martín; Muñoz-Espinoza, Claudia; Orellana, Ariel; Reyes-Jara, Angélica; Travisany, Dante; Vizoso, Paula; Moraga, Mauricio; Eyheramendy, Susana; Maass, Alejandro; De Ferrari, Giancarlo V; Miquel, Juan Francisco; Gutiérrez, Rodrigo A.

Sci Rep ; 9(1): 2132, 2019 02 14.

Article in English | MEDLINE | ID: mdl-30765821

ABSTRACT

Whole human genome sequencing initiatives help us understand population history and the basis of genetic diseases. Current data mostly focuses on Old World populations, and the information of the genomic structure of Native Americans, especially those from the Southern Cone is scant. Here we present annotation and variant discovery from high-quality complete genome sequences of a cohort of 11 Mapuche-Huilliche individuals (HUI) from Southern Chile. We found approximately 3.1 × 106 single nucleotide variants (SNVs) per individual and identified 403,383 (6.9%) of novel SNVs events. Analyses of large-scale genomic events detected 680 copy number variants (CNVs) and 4,514 structural variants (SVs), including 398 and 1,910 novel events, respectively. Global ancestry composition of HUI genomes revealed that the cohort represents a sample from a marginally admixed population from the Southern Cone, whose main genetic component derives from Native American ancestors. Additionally, we found that HUI genomes contain variants in genes associated with 5 of the 6 leading causes of noncommunicable diseases in Chile, which may have an impact on the risk of prevalent diseases in Chilean and Amerindian populations. Our data represents a useful resource that can contribute to population-based studies and for the design of early diagnostics or prevention tools for Native and admixed Latin American populations.

Subject(s)

Ethnicity/genetics , Genetic Markers , Genetics, Population , Genome, Human , Genomics/methods , Polymorphism, Single Nucleotide , Whole Genome Sequencing/methods , Adult , Aged , Aged, 80 and over , Chile , Cohort Studies , DNA Copy Number Variations , Female , Haplotypes , Humans , Male , Middle Aged , Young Adult

17.

Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data.

Contreras-López, Orlando; Moyano, Tomás C; Soto, Daniela C; Gutiérrez, Rodrigo A.

Methods Mol Biol ; 1761: 275-301, 2018.

Article in English | MEDLINE | ID: mdl-29525965

ABSTRACT

The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.

Subject(s)

Arabidopsis/genetics , Gene Expression Profiling , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Sequence Analysis, RNA , Transcriptome , Computational Biology/methods , Databases, Nucleic Acid , Gene Expression Profiling/methods , Software , Systems Biology/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL