Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
Add more filters

Publication year range
1.
Cell ; 187(6): 1547-1562.e13, 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38428424

ABSTRACT

We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.


Subject(s)
Genome , Primates , Animals , Humans , Base Sequence , Primates/classification , Primates/genetics , Biological Evolution , Sequence Analysis, DNA , Genomic Structural Variation
2.
Cell ; 171(3): 710-722.e12, 2017 Oct 19.
Article in English | MEDLINE | ID: mdl-28965761

ABSTRACT

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.


Subject(s)
Autistic Disorder/genetics , DNA Copy Number Variations , Polymorphism, Single Nucleotide , Animals , DNA Mutational Analysis , Female , Genome-Wide Association Study , Humans , INDEL Mutation , Male , Mice
3.
Nature ; 629(8010): 136-145, 2024 May.
Article in English | MEDLINE | ID: mdl-38570684

ABSTRACT

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.


Subject(s)
Centromere , Evolution, Molecular , Genetic Variation , Animals , Humans , Centromere/genetics , Centromere/metabolism , Centromere Protein A/metabolism , DNA Methylation/genetics , DNA, Satellite/genetics , Kinetochores/metabolism , Macaca/genetics , Pan troglodytes/genetics , Polymorphism, Single Nucleotide/genetics , Pongo/genetics , Male , Female , Reference Standards , Chromatin Immunoprecipitation , Haplotypes , Mutation , Gene Amplification , Sequence Alignment , Chromatin/genetics , Chromatin/metabolism , Species Specificity
4.
Nature ; 617(7960): 325-334, 2023 05.
Article in English | MEDLINE | ID: mdl-37165237

ABSTRACT

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.


Subject(s)
Gene Conversion , Mutation , Segmental Duplications, Genomic , Humans , Gene Conversion/genetics , Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Haplotypes/genetics , Exons/genetics , Cytosine/chemistry , Guanine/chemistry , CpG Islands/genetics
5.
Nature ; 621(7978): 355-364, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612510

ABSTRACT

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Subject(s)
Chromosomes, Human, Y , Evolution, Molecular , Humans , Male , Chromosomes, Human, Y/genetics , Genome, Human/genetics , Genomics , Mutation Rate , Phenotype , Euchromatin/genetics , Pseudogenes , Genetic Variation/genetics , Chromosomes, Human, X/genetics , Pseudoautosomal Regions/genetics
6.
Nature ; 593(7857): 101-107, 2021 05.
Article in English | MEDLINE | ID: mdl-33828295

ABSTRACT

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.


Subject(s)
Chromosomes, Human, Pair 8/chemistry , Chromosomes, Human, Pair 8/genetics , Evolution, Molecular , Animals , Cell Line , Centromere/chemistry , Centromere/genetics , Centromere/metabolism , Chromosomes, Human, Pair 8/physiology , DNA Methylation , DNA, Satellite/genetics , Epigenesis, Genetic , Female , Humans , Macaca mulatta/genetics , Male , Minisatellite Repeats/genetics , Pan troglodytes/genetics , Phylogeny , Pongo abelii/genetics , Telomere/chemistry , Telomere/genetics , Telomere/metabolism
7.
Nature ; 594(7861): 77-81, 2021 06.
Article in English | MEDLINE | ID: mdl-33953399

ABSTRACT

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics , Pan paniscus/genetics , Phylogeny , Animals , Eukaryotic Initiation Factor-4A/genetics , Female , Genes , Gorilla gorilla/genetics , Molecular Sequence Annotation/standards , Pan troglodytes/genetics , Pongo/genetics , Segmental Duplications, Genomic , Sequence Analysis, DNA
8.
Genome Res ; 33(12): 2029-2040, 2023 12 27.
Article in English | MEDLINE | ID: mdl-38190646

ABSTRACT

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Subject(s)
Genomics , Nanopores , INDEL Mutation , Whole Genome Sequencing
9.
Am J Hum Genet ; 109(4): 631-646, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35290762

ABSTRACT

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Female , Humans , Mutation/genetics , Nucleotides , Sequence Analysis, DNA , Software
10.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Article in English | MEDLINE | ID: mdl-34216551

ABSTRACT

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Subject(s)
Chromosome Aberrations , Cytogenetic Analysis/methods , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease , Genome, Human , Mutation , DNA Copy Number Variations , Female , Genetic Testing , High-Throughput Nucleotide Sequencing , Humans , Karyotyping , Male , Sequence Analysis, DNA
11.
Mol Psychiatry ; 28(2): 822-833, 2023 02.
Article in English | MEDLINE | ID: mdl-36266569

ABSTRACT

Autism Spectrum Disorder (ASD) diagnosis remains behavior-based and the median age of diagnosis is ~52 months, nearly 5 years after its first-trimester origin. Accurate and clinically-translatable early-age diagnostics do not exist due to ASD genetic and clinical heterogeneity. Here we collected clinical, diagnostic, and leukocyte RNA data from 240 ASD and typically developing (TD) toddlers (175 toddlers for training and 65 for test). To identify gene expression ASD diagnostic classifiers, we developed 42,840 models composed of 3570 gene expression feature selection sets and 12 classification methods. We found that 742 models had AUC-ROC ≥ 0.8 on both Training and Test sets. Weighted Bayesian model averaging of these 742 models yielded an ensemble classifier model with accurate performance in Training and Test gene expression datasets with ASD diagnostic classification AUC-ROC scores of 85-89% and AUC-PR scores of 84-92%. ASD toddlers with ensemble scores above and below the overall ASD ensemble mean of 0.723 (on a scale of 0 to 1) had similar diagnostic and psychometric scores, but those below this ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble model feature genes were involved in cell cycle, inflammation/immune response, transcriptional gene regulation, cytokine response, and PI3K-AKT, RAS and Wnt signaling pathways. We additionally collected targeted DNA sequencing smMIPs data on a subset of ASD risk genes from 217 of the 240 ASD and TD toddlers. This DNA sequencing found about the same percentage of SFARI Level 1 and 2 ASD risk gene mutations in TD (12 of 105) as in ASD (13 of 112) toddlers, and classification based only on the presence of mutation in these risk genes performed at a chance level of 49%. By contrast, the leukocyte ensemble gene expression classifier correctly diagnostically classified 88% of TD and ASD toddlers with ASD risk gene mutations. Our ensemble ASD gene expression classifier is diagnostically predictive and replicable across different toddler ages, races, and ethnicities; out-performs a risk gene mutation classifier; and has potential for clinical translation.


Subject(s)
Autism Spectrum Disorder , Humans , Child, Preschool , Infant , Autism Spectrum Disorder/diagnosis , Autism Spectrum Disorder/genetics , Bayes Theorem , Phosphatidylinositol 3-Kinases , Immunity , Gene Expression
12.
Am J Hum Genet ; 107(5): 963-976, 2020 11 05.
Article in English | MEDLINE | ID: mdl-33157009

ABSTRACT

NCKAP1/NAP1 regulates neuronal cytoskeletal dynamics and is essential for neuronal differentiation in the developing brain. Deleterious variants in NCKAP1 have been identified in individuals with autism spectrum disorder (ASD) and intellectual disability; however, its clinical significance remains unclear. To determine its significance, we assemble genotype and phenotype data for 21 affected individuals from 20 unrelated families with predicted deleterious variants in NCKAP1. This includes 16 individuals with de novo (n = 8), transmitted (n = 6), or inheritance unknown (n = 2) truncating variants, two individuals with structural variants, and three with potentially disruptive de novo missense variants. We report a de novo and ultra-rare deleterious variant burden of NCKAP1 in individuals with neurodevelopmental disorders which needs further replication. ASD or autistic features, language and motor delay, and variable expression of intellectual or learning disability are common clinical features. Among inherited cases, there is evidence of deleterious variants segregating with neuropsychiatric disorders. Based on available human brain transcriptomic data, we show that NCKAP1 is broadly and highly expressed in both prenatal and postnatal periods and demostrate enriched expression in excitatory neurons and radial glias but depleted expression in inhibitory neurons. Mouse in utero electroporation experiments reveal that Nckap1 loss of function promotes neuronal migration during early cortical development. Combined, these data support a role for disruptive NCKAP1 variants in neurodevelopmental delay/autism, possibly by interfering with neuronal migration early in cortical development.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Autism Spectrum Disorder/genetics , Intellectual Disability/genetics , Learning Disabilities/genetics , Mutation , Adaptor Proteins, Signal Transducing/deficiency , Adolescent , Animals , Autism Spectrum Disorder/diagnosis , Autism Spectrum Disorder/pathology , Cerebral Cortex/metabolism , Cerebral Cortex/pathology , Child , Female , Gene Expression , Genotype , HEK293 Cells , Humans , Intellectual Disability/diagnosis , Intellectual Disability/pathology , Learning Disabilities/diagnosis , Learning Disabilities/pathology , Male , Mice , Mice, Knockout , Neuroglia/metabolism , Neuroglia/pathology , Neurons/metabolism , Neurons/pathology , Pedigree , Phenotype , Pregnancy , Protein Isoforms/antagonists & inhibitors , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , Transcriptome , Young Adult
13.
Mol Biol Evol ; 38(12): 5576-5587, 2021 12 09.
Article in English | MEDLINE | ID: mdl-34464971

ABSTRACT

Human centromeres are mainly composed of alpha satellite DNA hierarchically organized as higher-order repeats (HORs). Alpha satellite dynamics is shown by sequence homogenization in centromeric arrays and by its transfer to other centromeric locations, for example, during the maturation of new centromeres. We identified during prenatal aneuploidy diagnosis by fluorescent in situ hybridization a de novo insertion of alpha satellite DNA from the centromere of chromosome 18 (D18Z1) into cytoband 15q26. Although bound by CENP-B, this locus did not acquire centromeric functionality as demonstrated by the lack of constriction and the absence of CENP-A binding. The insertion was associated with a 2.8-kbp deletion and likely occurred in the paternal germline. The site was enriched in long terminal repeats and located ∼10 Mbp from the location where a centromere was ancestrally seeded and became inactive in the common ancestor of humans and apes 20-25 million years ago. Long-read mapping to the T2T-CHM13 human genome assembly revealed that the insertion derives from a specific region of chromosome 18 centromeric 12-mer HOR array in which the monomer size follows a regular pattern. The rearrangement did not directly disrupt any gene or predicted regulatory element and did not alter the methylation status of the surrounding region, consistent with the absence of phenotypic consequences in the carrier. This case demonstrates a likely rare but new class of structural variation that we name "alpha satellite insertion." It also expands our knowledge on alphoid DNA dynamics and conveys the possibility that alphoid arrays can relocate near vestigial centromeric sites.


Subject(s)
Centromere , Chromosomal Proteins, Non-Histone , Centromere/genetics , Centromere/metabolism , Centromere Protein B/genetics , Centromere Protein B/metabolism , Chromosomal Proteins, Non-Histone/genetics , DNA, Satellite/genetics , Humans , In Situ Hybridization, Fluorescence
14.
Am J Hum Genet ; 105(5): 947-958, 2019 11 07.
Article in English | MEDLINE | ID: mdl-31668704

ABSTRACT

Human-specific duplications at chromosome 16p11.2 mediate recurrent pathogenic 600 kbp BP4-BP5 copy-number variations, which are among the most common genetic causes of autism. These copy-number polymorphic duplications are under positive selection and include three to eight copies of BOLA2, a gene involved in the maturation of cytosolic iron-sulfur proteins. To investigate the potential advantage provided by the rapid expansion of BOLA2, we assessed hematological traits and anemia prevalence in 379,385 controls and individuals who have lost or gained copies of BOLA2: 89 chromosome 16p11.2 BP4-BP5 deletion carriers and 56 reciprocal duplication carriers in the UK Biobank. We found that the 16p11.2 deletion is associated with anemia (18/89 carriers, 20%, p = 4e-7, OR = 5), particularly iron-deficiency anemia. We observed similar enrichments in two clinical 16p11.2 deletion cohorts, which included 6/63 (10%) and 7/20 (35%) unrelated individuals with anemia, microcytosis, low serum iron, or low blood hemoglobin. Upon stratification by BOLA2 copy number, our data showed an association between low BOLA2 dosage and the above phenotypes (8/15 individuals with three copies, 53%, p = 1e-4). In parallel, we analyzed hematological traits in mice carrying the 16p11.2 orthologous deletion or duplication, as well as Bola2+/- and Bola2-/- animals. The Bola2-deficient mice and the mice carrying the deletion showed early evidence of iron deficiency, including a mild decrease in hemoglobin, lower plasma iron, microcytosis, and an increased red blood cell zinc-protoporphyrin-to-heme ratio. Our results indicate that BOLA2 participates in iron homeostasis in vivo, and its expansion has a potential adaptive role in protecting against iron deficiency.


Subject(s)
Anemia/genetics , Autistic Disorder/genetics , Chromosome Duplication/genetics , Chromosomes, Human, Pair 16/genetics , Homeostasis/genetics , Proteins/genetics , Animals , Chromosome Deletion , Chromosome Disorders/genetics , DNA Copy Number Variations/genetics , Female , Genotype , Heterozygote , Humans , Iron , Male , Phenotype
15.
Syst Biol ; 71(1): 78-92, 2021 12 16.
Article in English | MEDLINE | ID: mdl-34097063

ABSTRACT

The Neotropics harbor the most species-rich freshwater fish fauna on the planet, but the timing of that exceptional diversification remains unclear. Did the Neotropics accumulate species steadily throughout their long history, or attain their remarkable diversity recently? Biologists have long debated the relative support for these museum and cradle hypotheses, but few phylogenies of megadiverse tropical clades have included sufficient taxa to distinguish between them. We used 1288 ultraconserved element loci spanning 293 species, 211 genera, and 21 families of characoid fishes to reconstruct a new, fossil-calibrated phylogeny and infer the most likely diversification scenario for a clade that includes a third of Neotropical fish diversity. This phylogeny implies paraphyly of the traditional delimitation of Characiformes because it resolves the largely Neotropical Characoidei as the sister lineage of Siluriformes (catfishes), rather than the African Citharinodei. Time-calibrated phylogenies indicate an ancient origin of major characoid lineages and reveal a much more recent emergence of most characoid species. Diversification rate analyses infer increased speciation and decreased extinction rates during the Oligocene at around 30 Ma during a period of mega-wetland formation in the proto-Orinoco-Amazonas. Three species-rich and ecomorphologically diverse lineages (Anostomidae, Serrasalmidae, and Characidae) that originated more than 60 Ma in the Paleocene experienced particularly notable bursts of Oligocene diversification and now account collectively for 68% of the approximately 2150 species of Characoidei. In addition to paleogeographic changes, we discuss potential accelerants of diversification in these three lineages. While the Neotropics accumulated a museum of ecomorphologically diverse characoid lineages long ago, this geologically dynamic region also cradled a much more recent birth of remarkable species-level diversity. [Biodiversity; Characiformes; macroevolution; Neotropics; phylogenomics; ultraconserved elements.].


Subject(s)
Catfishes , Characiformes , Animals , Biodiversity , Fossils , Phylogeny
16.
Genome Res ; 27(5): 677-685, 2017 05.
Article in English | MEDLINE | ID: mdl-27895111

ABSTRACT

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.


Subject(s)
Contig Mapping/methods , Genome, Human , Genomic Structural Variation , Haploidy , Sequence Analysis, DNA/methods , Contig Mapping/standards , Human Genome Project , Humans , Sequence Analysis, DNA/standards
17.
Am J Hum Genet ; 98(1): 58-74, 2016 Jan 07.
Article in English | MEDLINE | ID: mdl-26749308

ABSTRACT

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.


Subject(s)
Autistic Disorder/genetics , DNA/genetics , Genome, Human , Exome , Female , Humans , Male , Pedigree , Polymorphism, Single Nucleotide
18.
Am J Hum Genet ; 98(3): 541-552, 2016 Mar 03.
Article in English | MEDLINE | ID: mdl-26942287

ABSTRACT

Intellectual disability (ID) and autism spectrum disorders (ASD) are genetically heterogeneous, and a significant number of genes have been associated with both conditions. A few mutations in POGZ have been reported in recent exome studies; however, these studies do not provide detailed clinical information. We collected the clinical and molecular data of 25 individuals with disruptive mutations in POGZ by diagnostic whole-exome, whole-genome, or targeted sequencing of 5,223 individuals with neurodevelopmental disorders (ID primarily) or by targeted resequencing of this locus in 12,041 individuals with ASD and/or ID. The rarity of disruptive mutations among unaffected individuals (2/49,401) highlights the significance (p = 4.19 × 10(-13); odds ratio = 35.8) and penetrance (65.9%) of this genetic subtype with respect to ASD and ID. By studying the entire cohort, we defined common phenotypic features of POGZ individuals, including variable levels of developmental delay (DD) and more severe speech and language delay in comparison to the severity of motor delay and coordination issues. We also identified significant associations with vision problems, microcephaly, hyperactivity, a tendency to obesity, and feeding difficulties. Some features might be explained by the high expression of POGZ, particularly in the cerebellum and pituitary, early in fetal brain development. We conducted parallel studies in Drosophila by inducing conditional knockdown of the POGZ ortholog row, further confirming that dosage of POGZ, specifically in neurons, is essential for normal learning in a habituation paradigm. Combined, the data underscore the pathogenicity of loss-of-function mutations in POGZ and define a POGZ-related phenotype enriched in specific features.


Subject(s)
Autism Spectrum Disorder/genetics , Intellectual Disability/genetics , Transposases/genetics , Adolescent , Adult , Animals , Autism Spectrum Disorder/diagnosis , Child , Child, Preschool , Cohort Studies , Down-Regulation , Drosophila/genetics , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Exome , Female , Gene Knockdown Techniques , Genome-Wide Association Study , Humans , Infant , Intellectual Disability/diagnosis , Language Development Disorders/diagnosis , Language Development Disorders/genetics , Linear Models , Male , Microcephaly/diagnosis , Microcephaly/genetics , Mutation , Phenotype , Transcription Factors/genetics , Transcription Factors/metabolism
19.
Genet Med ; 21(7): 1611-1620, 2019 07.
Article in English | MEDLINE | ID: mdl-30504930

ABSTRACT

PURPOSE: To maximize the discovery of potentially pathogenic variants to better understand the diagnostic utility of genome sequencing (GS) and to assess how the presence of multiple risk events might affect the phenotypic severity in autism spectrum disorders (ASD). METHODS: GS was applied to 180 simplex and multiplex ASD families (578 individuals, 213 patients) with exome sequencing and array comparative genomic hybridization further applied to a subset for validation and cross-platform comparisons. RESULTS: We found that 40.8% of patients carried variants with evidence of disease risk, including a de novo frameshift variant in NR4A2 and two de novo missense variants in SYNCRIP, while 21.1% carried clinically relevant pathogenic or likely pathogenic variants. Patients with more than one risk variant (9.9%) were more severely affected with respect to cognitive ability compared with patients with a single or no-risk variant. We observed no instance among the 27 multiplex families where a pathogenic or likely pathogenic variant was transmitted to all affected members in the family. CONCLUSION: The study demonstrates the diagnostic utility of GS, especially for multiple risk variants that contribute to the phenotypic severity, shows the genetic heterogeneity in multiplex families, and provides evidence for new genes for follow up.


Subject(s)
Autistic Disorder/genetics , Exome Sequencing , Child , Comparative Genomic Hybridization , DNA Copy Number Variations , DNA Mutational Analysis , Female , Humans , Male , Phenotype
20.
Nucleic Acids Res ; 45(D1): D804-D811, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27907889

ABSTRACT

Whole-exome and whole-genome sequencing have facilitated the large-scale discovery of de novo variants in human disease. To date, most de novo discovery through next-generation sequencing focused on congenital heart disease and neurodevelopmental disorders (NDDs). Currently, de novo variants are one of the most significant risk factors for NDDs with a substantial overlap of genes involved in more than one NDD. To facilitate better usage of published data, provide standardization of annotation, and improve accessibility, we created denovo-db (http://denovo-db.gs.washington.edu), a database for human de novo variants. As of July 2016, denovo-db contained 40 different studies and 32,991 de novo variants from 23,098 trios. Database features include basic variant information (chromosome location, change, type); detailed annotation at the transcript and protein levels; severity scores; frequency; validation status; and, most importantly, the phenotype of the individual with the variant. We included a feature on our browsable website to download any query result, including a downloadable file of the full database with additional variant details. denovo-db provides necessary information for researchers to compare their data to other individuals with the same phenotype and also to controls allowing for a better understanding of the biology of de novo variants and their contribution to disease.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Genetic Variation , Germ-Line Mutation , Polymorphism, Single Nucleotide , Genetic Association Studies , Humans , Molecular Sequence Annotation , Web Browser
SELECTION OF CITATIONS
SEARCH DETAIL