Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 113
Filter
1.
Elife ; 132024 Jan 19.
Article in English | MEDLINE | ID: mdl-38240312

ABSTRACT

Out of the several hundred copies of rRNA genes arranged in the nucleolar organizing regions (NOR) of the five human acrocentric chromosomes, ~50% remain transcriptionally inactive. NOR-associated sequences and epigenetic modifications contribute to the differential expression of rRNAs. However, the mechanism(s) controlling the dosage of active versus inactive rRNA genes within each NOR in mammals is yet to be determined. We have discovered a family of ncRNAs, SNULs (Single NUcleolus Localized RNA), which form constrained sub-nucleolar territories on individual NORs and influence rRNA expression. Individual members of the SNULs monoallelically associate with specific NOR-containing chromosomes. SNULs share sequence similarity to pre-rRNA and localize in the sub-nucleolar compartment with pre-rRNA. Finally, SNULs control rRNA expression by influencing pre-rRNA sorting to the DFC compartment and pre-rRNA processing. Our study discovered a novel class of ncRNAs influencing rRNA expression by forming constrained nucleolar territories on individual NORs.


Subject(s)
Nucleolus Organizer Region , RNA Precursors , Humans , Animals , Nucleolus Organizer Region/genetics , Nucleolus Organizer Region/metabolism , RNA Precursors/genetics , RNA Precursors/metabolism , Cell Nucleolus/genetics , Cell Nucleolus/metabolism , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , Chromosomes, Human/metabolism , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Mammals/genetics
2.
bioRxiv ; 2023 Sep 27.
Article in English | MEDLINE | ID: mdl-37808736

ABSTRACT

Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid de novo genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1, and RB1) previously associated with single-gene MCs. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four 'omes' to resolve. These included nonsense-mediated decay, fusion transcript formation, enhancer adoption, transcriptional readthrough silencing, and inappropriate X chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.

3.
Microb Genom ; 9(5)2023 05.
Article in English | MEDLINE | ID: mdl-37194944

ABSTRACT

The National Collection of Type Cultures (NCTC) was founded on 1 January 1920 in order to fulfil a recognized need for a centralized repository for bacterial and fungal strains within the UK. It is among the longest-established collections of its kind anywhere in the world and today holds approximately 6000 type and reference bacterial strains - many of medical, scientific and veterinary importance - available to academic, health, food and veterinary institutions worldwide. Recently, a collaboration between NCTC, Pacific Biosciences and the Wellcome Sanger Institute established the NCTC3000 project to long-read sequence and assemble the genomes of up to 3000 NCTC strains. Here, at the beginning of the collection's second century, we introduce the resulting NCTC3000 sequence read datasets, genome assemblies and annotations as a unique, historically and scientifically relevant resource for the benefit of the international bacterial research community.


Subject(s)
Genome, Bacterial , Genomics , Sequence Analysis, DNA/methods , Genome, Bacterial/genetics , Bacteria/genetics
4.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
5.
Proc Natl Acad Sci U S A ; 119(40): e2209139119, 2022 10 04.
Article in English | MEDLINE | ID: mdl-36161960

ABSTRACT

Decrypting the rearrangements that drive mammalian chromosome evolution is critical to understanding the molecular bases of speciation, adaptation, and disease susceptibility. Using 8 scaffolded and 26 chromosome-scale genome assemblies representing 23/26 mammal orders, we computationally reconstructed ancestral karyotypes and syntenic relationships at 16 nodes along the mammalian phylogeny. Three different reference genomes (human, sloth, and cattle) representing phylogenetically distinct mammalian superorders were used to assess reference bias in the reconstructed ancestral karyotypes and to expand the number of clades with reconstructed genomes. The mammalian ancestor likely had 19 pairs of autosomes, with nine of the smallest chromosomes shared with the common ancestor of all amniotes (three still conserved in extant mammals), demonstrating a striking conservation of synteny for ∼320 My of vertebrate evolution. The numbers and types of chromosome rearrangements were classified for transitions between the ancestral mammalian karyotype, descendent ancestors, and extant species. For example, 94 inversions, 16 fissions, and 14 fusions that occurred over 53 My differentiated the therian from the descendent eutherian ancestor. The highest breakpoint rate was observed between the mammalian and therian ancestors (3.9 breakpoints/My). Reconstructed mammalian ancestor chromosomes were found to have distinct evolutionary histories reflected in their rates and types of rearrangements. The distributions of genes, repetitive elements, topologically associating domains, and actively transcribed regions in multispecies homologous synteny blocks and evolutionary breakpoint regions indicate that purifying selection acted over millions of years of vertebrate evolution to maintain syntenic relationships of developmentally important genes and regulatory landscapes of gene-dense chromosomes.


Subject(s)
Evolution, Molecular , Karyotype , Mammals , Synteny , Animals , Cattle/genetics , Chromosomes, Mammalian/genetics , Eutheria/genetics , Humans , Mammals/genetics , Phylogeny , Sloths/genetics , Synteny/genetics
6.
Hum Mutat ; 43(11): 1557-1566, 2022 11.
Article in English | MEDLINE | ID: mdl-36057977

ABSTRACT

To determine the phase of NUDT15 sequence variants for more comprehensive star (*) allele diplotyping, we developed a novel long-read single-molecule real-time HiFi amplicon sequencing method. A 10.5 kb NUDT15 amplicon assay was validated using reference material positive controls and additional samples for specimen type and blinded accuracy assessment. Triplicate NUDT15 HiFi sequencing of two reference material samples had nonreference genotype concordances of >99.9%, indicating that the assay is robust. Notably, short-read genome sequencing of a subset of samples was unable to determine the phase of star (*) allele-defining NUDT15 variants, resulting in ambiguous diplotype results. In contrast, long-read HiFi sequencing phased all variants across the NUDT15 amplicons, including a *2/*9 diplotype that previously was characterized as *1/*2 in the 1000 Genomes Project v3 data set. Assay throughput was also tested using 8.5 kb amplicons from 100 Ashkenazi Jewish individuals, which identified a novel NUDT15 *1 suballele (c.-121G>A) and a rare likely deleterious coding variant (p.Pro129Arg). Both novel alleles were Sanger confirmed and assigned as *1.007 and *20, respectively, by the PharmVar Consortium. Taken together, NUDT15 HiFi amplicon sequencing is an innovative method for phased full-gene characterization and novel allele discovery, which could improve NUDT15 pharmacogenomic testing and subsequent phenotype prediction.


Subject(s)
Pharmacogenetics , Alleles , Genotype , Haplotypes , Humans , Sequence Analysis, DNA/methods
7.
Am J Med Genet A ; 188(7): 2071-2081, 2022 07.
Article in English | MEDLINE | ID: mdl-35366058

ABSTRACT

Currently, protein-coding de novo variants and large copy number variants have been identified as important for ~30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, epileptic encephalopathy, cognitive impairment, and mild dysmorphic features (two affected female siblings, unaffected parents, and one unaffected male sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment revealed the variant was not in controls, highly conserved, and predicted damaging. This specific missense variant (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense variants have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10-5 ). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available.


Subject(s)
Autistic Disorder , Epilepsy , Shaw Potassium Channels , Autistic Disorder/genetics , Child , Epilepsy/genetics , Female , Germ Cells , Humans , Male , Mosaicism , Mutation, Missense , Shaw Potassium Channels/genetics
9.
Science ; 376(6588): 44-53, 2022 04.
Article in English | MEDLINE | ID: mdl-35357919

ABSTRACT

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Subject(s)
Genome, Human , Human Genome Project , Sequence Analysis, DNA/standards , Cell Line , Chromosomes, Artificial, Bacterial/genetics , Chromosomes, Human/genetics , Humans , Reference Values
10.
Genet Med ; 24(6): 1336-1348, 2022 06.
Article in English | MEDLINE | ID: mdl-35305867

ABSTRACT

PURPOSE: This study aimed to provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids program. METHODS: Extensive analyses of 960 families with suspected genetic disorders included short-read exome sequencing and short-read genome sequencing (srGS); PacBio HiFi long-read genome sequencing (HiFi-GS); variant calling for single nucleotide variants (SNV), structural variant (SV), and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants, and pedigrees were stored in PhenoTips database, with data sharing through controlled access the database of Genotypes and Phenotypes. RESULTS: Diagnostic rates ranged from 11% in patients with prior negative genetic testing to 34.5% in naive patients. Incorporating SVs from genome sequencing added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs compared with srGS. Variants and genes of unknown significance remain the most common finding (58% of nondiagnostic cases). CONCLUSION: Computational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated using HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation and by providing HiFi variant (SNV/SV) resources from >1000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.


Subject(s)
Genomics , Rare Diseases , Child , Genome , High-Throughput Nucleotide Sequencing , Humans , Pedigree , Rare Diseases/diagnosis , Rare Diseases/genetics , Sequence Analysis, DNA
11.
PLoS One ; 16(7): e0253267, 2021.
Article in English | MEDLINE | ID: mdl-34228724

ABSTRACT

We report a new subgroup of Type III Restriction-Modification systems that use m4C methylation for host protection. Recognition specificities for six such systems, each recognizing a novel motif, have been determined using single molecule real-time DNA sequencing. In contrast to all previously characterized Type III systems which modify adenine to m6A, protective methylation of the host genome in these new systems is achieved by the N4-methylation of a cytosine base in one strand of an asymmetric 4 to 6 base pair recognition motif. Type III systems are heterotrimeric enzyme complexes containing a single copy of an ATP-dependent restriction endonuclease-helicase (Res) and a dimeric DNA methyltransferase (Mod). The Type III Mods are beta-class amino-methyltransferases, examples of which form either N6-methyl adenine or N4-methyl cytosine in Type II RM systems. The Type III m4C Mod and Res proteins are diverged, suggesting ancient origin or that m4C modification has arisen from m6A MTases multiple times in diverged lineages. Two of the systems, from thermophilic organisms, required expression of both Mod and Res to efficiently methylate an E. coli host, unlike previous findings that Mod alone is proficient at modification, suggesting that the division of labor between protective methylation and restriction activities is atypical in these systems. Two of the characterized systems, and many homologous putative systems, appear to include a third protein; a conserved putative helicase/ATPase subunit of unknown function and located 5' of the mod gene. The function of this additional ATPase is not yet known, but close homologs co-localize with the typical Mod and Res genes in hundreds of putative Type III systems. Our findings demonstrate a rich diversity within Type III RM systems.


Subject(s)
Cytosine , DNA Methylation , DNA Restriction-Modification Enzymes/genetics , DNA/metabolism , Cytosine/metabolism , DNA Modification Methylases/chemistry , DNA Modification Methylases/genetics , DNA Modification Methylases/metabolism , DNA Restriction Enzymes/chemistry , DNA Restriction Enzymes/genetics , DNA Restriction Enzymes/metabolism , DNA Restriction-Modification Enzymes/chemistry , DNA Restriction-Modification Enzymes/metabolism , Escherichia coli/genetics , Escherichia coli Proteins/genetics , Gas Chromatography-Mass Spectrometry , Sequence Alignment , Sequence Analysis, DNA
12.
Genome Biol ; 22(1): 120, 2021 04 29.
Article in English | MEDLINE | ID: mdl-33910595

ABSTRACT

BACKGROUND: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. RESULTS: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100-300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. CONCLUSIONS: Our results indicate that even in the "simple" case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.


Subject(s)
Gene Duplication , Genome, Mitochondrial , Genomics , Repetitive Sequences, Nucleic Acid , Vertebrates/genetics , Animals , Computational Biology/methods , Computational Biology/standards , Evolution, Molecular , Genomics/methods , High-Throughput Nucleotide Sequencing
13.
Nature ; 592(7856): 756-762, 2021 04.
Article in English | MEDLINE | ID: mdl-33408411

ABSTRACT

Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.


Subject(s)
Biological Evolution , Genome , Platypus/genetics , Tachyglossidae/genetics , Animals , Female , Male , Mammals/genetics , Phylogeny , Sex Chromosomes/genetics
14.
Cell Genom ; 1(1): 100002, 2021 Oct 13.
Article in English | MEDLINE | ID: mdl-36777713

ABSTRACT

The kakapo is a flightless parrot endemic to New Zealand. Once common in the archipelago, only 201 individuals remain today, most of them descending from an isolated island population. We report the first genome-wide analyses of the species, including a high-quality genome assembly for kakapo, one of the first chromosome-level reference genomes sequenced by the Vertebrate Genomes Project (VGP). We also sequenced and analyzed 35 modern genomes from the sole surviving island population and 14 genomes from the extinct mainland population. While theory suggests that such a small population is likely to have accumulated deleterious mutations through genetic drift, our analyses on the impact of the long-term small population size in kakapo indicate that present-day island kakapo have a reduced number of harmful mutations compared to mainland individuals. We hypothesize that this reduced mutational load is due to the island population having been subjected to a combination of genetic drift and purging of deleterious mutations, through increased inbreeding and purifying selection, since its isolation from the mainland ∼10,000 years ago. Our results provide evidence that small populations can survive even when isolated for hundreds of generations. This work provides key insights into kakapo breeding and recovery and more generally into the application of genetic tools in conservation efforts for endangered species.

15.
Transl Psychiatry ; 10(1): 369, 2020 11 02.
Article in English | MEDLINE | ID: mdl-33139705

ABSTRACT

The human genome harbors numerous structural variants (SVs) which, due to their repetitive nature, are currently underexplored in short-read whole-genome sequencing approaches. Using single-molecule, real-time (SMRT) long-read sequencing technology in combination with FALCON-Unzip, we generated a de novo assembly of the diploid genome of a 115-year-old Dutch cognitively healthy woman. We combined this assembly with two previously published haploid assemblies (CHM1 and CHM13) and the GRCh38 reference genome to create a compendium of SVs that occur across five independent human haplotypes using the graph-based multi-genome aligner REVEAL. Across these five haplotypes, we detected 31,680 euchromatic SVs (>50 bp). Of these, ~62% were comprised of repetitive sequences with 'variable number tandem repeats' (VNTRs), ~10% were mobile elements (Alu, L1, and SVA), while the remaining variants were inversions and indels. We observed that VNTRs with GC-content >60% and repeat patterns longer than 15 bp were 21-fold enriched in the subtelomeric regions (within 5 Mb of the ends of chromosome arms). VNTR lengths can expand to exceed a critical length which is associated with impaired gene transcription. The genes that contained most VNTRs, of which PTPRN2 and DLGAP2 are the most prominent examples, were found to be predominantly expressed in the brain and associated with a wide variety of neurological disorders. Repeat-induced variation represents a sizeable fraction of the genetic variation in human genomes and should be included in investigations of genetic factors associated with phenotypic traits, specifically those associated with neurological disorders. We make available the long and short-read sequence data of the supercentenarian genome, and a compendium of SVs as identified across 5 human haplotypes.


Subject(s)
Genome, Human , Minisatellite Repeats , Aged, 80 and over , Brain , Female , Haplotypes , Humans , Minisatellite Repeats/genetics , Sequence Analysis, DNA
16.
Microbiol Resour Announc ; 9(24)2020 Jun 11.
Article in English | MEDLINE | ID: mdl-32527783

ABSTRACT

Lymphatic filariasis affects ∼120 million people and can result in elephantiasis and hydrocele. Here, we report the nearly complete genome sequence of the best-studied causative agent of lymphatic filariasis, Brugia malayi The assembly contains four autosomes, an X chromosome, and only eight gaps but lacks a contiguous sequence for the known Y chromosome.

17.
Nat Commun ; 11(1): 1964, 2020 04 23.
Article in English | MEDLINE | ID: mdl-32327641

ABSTRACT

Sex determination mechanisms often differ even between related species yet the evolution of sex chromosomes remains poorly understood in all but a few model organisms. Some nematodes such as Caenorhabditis elegans have an XO sex determination system while others, such as the filarial parasite Brugia malayi, have an XY mechanism. We present a complete B. malayi genome assembly and define Nigon elements shared with C. elegans, which we then map to the genomes of other filarial species and more distantly related nematodes. We find a remarkable plasticity in sex chromosome evolution with several distinct cases of neo-X and neo-Y formation, X-added regions, and conversion of autosomes to sex chromosomes from which we propose a model of chromosome evolution across different nematode clades. The phylum Nematoda offers a new and innovative system for gaining a deeper understanding of sex chromosome evolution.


Subject(s)
Evolution, Molecular , Nematoda/genetics , Nematode Infections/parasitology , Sex Chromosomes/genetics , Animals , Brugia malayi/genetics , Caenorhabditis elegans/genetics , Chromosome Mapping , Female , Gene Expression Regulation , Genome, Helminth/genetics , Humans , Male , Nematoda/classification , Repetitive Sequences, Nucleic Acid/genetics , Sex Determination Processes/genetics
18.
Gigascience ; 8(10)2019 10 01.
Article in English | MEDLINE | ID: mdl-31609423

ABSTRACT

BACKGROUND: A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. RESULTS: The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. CONCLUSIONS: We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


Subject(s)
Diptera/genetics , Genome, Insect , Genomics/methods , Animals , Female , Gene Library , Introduced Species , Sequence Analysis, DNA
19.
Nat Med ; 25(6): 1012-1021, 2019 06.
Article in English | MEDLINE | ID: mdl-31142849

ABSTRACT

The incidence of preterm birth exceeds 10% worldwide. There are significant disparities in the frequency of preterm birth among populations within countries, and women of African ancestry disproportionately bear the burden of risk in the United States. In the present study, we report a community resource that includes 'omics' data from approximately 12,000 samples as part of the integrative Human Microbiome Project. Longitudinal analyses of 16S ribosomal RNA, metagenomic, metatranscriptomic and cytokine profiles from 45 preterm and 90 term birth controls identified harbingers of preterm birth in this cohort of women predominantly of African ancestry. Women who delivered preterm exhibited significantly lower vaginal levels of Lactobacillus crispatus and higher levels of BVAB1, Sneathia amnii, TM7-H1, a group of Prevotella species and nine additional taxa. The first representative genomes of BVAB1 and TM7-H1 are described. Preterm-birth-associated taxa were correlated with proinflammatory cytokines in vaginal fluid. These findings highlight new opportunities for assessment of the risk of preterm birth.


Subject(s)
Microbiota , Premature Birth/microbiology , Vagina/microbiology , Adult , Black or African American , Biodiversity , Cohort Studies , Cytokines/metabolism , Female , Host Microbial Interactions/immunology , Humans , Infant, Newborn , Inflammation Mediators/metabolism , Longitudinal Studies , Metagenomics , Microbiota/genetics , Microbiota/immunology , Premature Birth/etiology , Premature Birth/immunology , Risk Factors , United States , Vagina/immunology , Young Adult
20.
Nat Commun ; 10(1): 1784, 2019 04 16.
Article in English | MEDLINE | ID: mdl-30992455

ABSTRACT

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.


Subject(s)
Genome, Human/genetics , Genomic Structural Variation , Genomics/methods , Haplotypes/genetics , Algorithms , Chromosome Mapping/methods , Databases, Genetic , High-Throughput Nucleotide Sequencing/methods , Humans , INDEL Mutation , Whole Genome Sequencing/methods
SELECTION OF CITATIONS
SEARCH DETAIL