RESUMO
Robertsonian chromosomes are a type of variant chromosome found commonly in nature. Present in one in 800 humans, these chromosomes can underlie infertility, trisomies, and increased cancer incidence. Recognized cytogenetically for more than a century, their origins have remained mysterious. Recent advances in genomics allowed us to assemble three human Robertsonian chromosomes completely. We identify a common breakpoint and epigenetic changes in centromeres that provide insight into the formation and propagation of common Robertsonian translocations. Further investigation of the assembled genomes of chimpanzee and bonobo highlights the structural features of the human genome that uniquely enable the specific crossover event that creates these chromosomes. Resolving the structure and epigenetic features of human Robertsonian chromosomes at a molecular level paves the way to understanding how chromosomal structural variation occurs more generally, and how chromosomes evolve.
RESUMO
The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
Assuntos
Balaenoptera , Neoplasias , Animais , Balaenoptera/genética , Duplicações Segmentares Genômicas , Genoma , Demografia , Neoplasias/genéticaRESUMO
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Feminino , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Gravidez , Análise de Sequência de DNA/métodos , Telômero/genéticaRESUMO
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
Assuntos
Variações do Número de Cópias de DNA , Duplicação Gênica , Genoma Humano , Duplicações Segmentares Genômicas , Evolução Molecular , Proteínas Ativadoras de GTPase/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas/genéticaRESUMO
Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.
Assuntos
Evolução Molecular , Proteínas de Peixes/genética , Genoma , Tubarões/genética , Transcriptoma , Animais , Biomarcadores Tumorais/genética , Transformação Celular Neoplásica/genética , Transformação Celular Neoplásica/patologia , Duplicação Gênica , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Imunidade Inata/genética , Neoplasias/genética , Neoplasias/patologia , Filogenia , Receptores Imunológicos/genética , Tubarões/imunologia , Sequenciamento Completo do GenomaRESUMO
Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Linhagem Celular , Duplicação Cromossômica , DNA de Neoplasias , DNA Satélite , Drosophila/genética , Genoma Humano , Haplótipos , Humanos , Reprodutibilidade dos Testes , SoftwareRESUMO
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Assuntos
Contaminação por DNA , Ensaios de Triagem em Larga Escala , Metagenômica/métodos , Algoritmos , Humanos , Polyomavirus/isolamento & purificação , ProteomaRESUMO
Y chromosomes control essential male functions in many species, including sex determination and fertility. However, because of obstacles posed by repeat-rich heterochromatin, knowledge of Y chromosome sequences is limited to a handful of model organisms, constraining our understanding of Y biology across the tree of life. Here, we leverage long single-molecule sequencing to determine the content and structure of the nonrecombining Y chromosome of the primary African malaria mosquito, Anopheles gambiae We find that the An. gambiae Y consists almost entirely of a few massively amplified, tandemly arrayed repeats, some of which can recombine with similar repeats on the X chromosome. Sex-specific genome resequencing in a recent species radiation, the An. gambiae complex, revealed rapid sequence turnover within An. gambiae and among species. Exploiting 52 sex-specific An. gambiae RNA-Seq datasets representing all developmental stages, we identified a small repertoire of Y-linked genes that lack X gametologs and are not Y-linked in any other species except An. gambiae, with the notable exception of YG2, a candidate male-determining gene. YG2 is the only gene conserved and exclusive to the Y in all species examined, yet sequence similarity to YG2 is not detectable in the genome of a more distant mosquito relative, suggesting rapid evolution of Y chromosome genes in this highly dynamic genus of malaria vectors. The extensive characterization of the An. gambiae Y provides a long-awaited foundation for studying male mosquito biology, and will inform novel mosquito control strategies based on the manipulation of Y chromosomes.
Assuntos
Anopheles/genética , Cromossomos de Insetos/genética , Insetos Vetores/genética , Cromossomo Y/genética , Animais , Feminino , Malária , Masculino , Filogenia , Análise de Sequência de DNA , Cromossomo X/genéticaRESUMO
Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.
Assuntos
Genoma Fúngico , Genoma Humano , Genoma de Inseto , Genoma de Planta , Análise de Sequência de DNA , Animais , Arabidopsis/genética , Sequência de Bases , Cromossomos/genética , Drosophila melanogaster/genética , Heterocromatina , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Saccharomyces cerevisiae/genética , Alinhamento de SequênciaRESUMO
The relationship between tunicates and the uncultivated cyanobacterium Prochloron didemni has long provided a model symbiosis. P. didemni is required for survival of animals such as Lissoclinum patella and also makes secondary metabolites of pharmaceutical interest. Here, we present the metagenomes, chemistry, and microbiomes of four related L. patella tunicate samples from a wide geographical range of the tropical Pacific. The remarkably similar P. didemni genomes are the most complex so far assembled from uncultivated organisms. Although P. didemni has not been stably cultivated and comprises a single strain in each sample, a complete set of metabolic genes indicates that the bacteria are likely capable of reproducing outside the host. The sequences reveal notable peculiarities of the photosynthetic apparatus and explain the basis of nutrient exchange underlying the symbiosis. P. didemni likely profoundly influences the lipid composition of the animals by synthesizing sterols and an unusual lipid with biofuel potential. In addition, L. patella also harbors a great variety of other bacterial groups that contribute nutritional and secondary metabolic products to the symbiosis. These bacteria possess an enormous genetic potential to synthesize new secondary metabolites. For example, an antitumor candidate molecule, patellazole, is not encoded in the genome of Prochloron and was linked to other bacteria from the microbiome. This study unveils the complex L. patella microbiome and its impact on primary and secondary metabolism, revealing a remarkable versatility in creating and exchanging small molecules.
Assuntos
Metagenoma/fisiologia , Prochloron/metabolismo , Animais , Genoma , Genômica , Metagenômica , Modelos Biológicos , Modelos Genéticos , Dados de Sequência Molecular , Fotossíntese , Filogenia , RNA Ribossômico 16S/metabolismo , Análise de Sequência de DNA , Simbiose , UrocordadosRESUMO
Salmonella enterica serovars Enteritidis and Typhimurium are the leading causative agents of salmonellosis in the United States. S. Enteritidis is predominantly associated with contamination of shell eggs and egg products, whereas S. Typhimurium is frequently linked to tainted poultry meats, fresh produce, and recently, peanut-based products. Chlorine is an oxidative disinfectant commonly used in the food industry to sanitize the surfaces of foods and food processing facilities (e.g., shell eggs and poultry meats). However, chlorine disinfection is not always effective, as some S. enterica strains may resist and survive the disinfection process. To date, little is known about the underlying mechanisms of how S. enterica responds to chlorine-based oxidative stress. In this study, we designed a custom bigenome microarray that consists of 385,000 60-mer oligonucleotide probes and targets 4,793 unique gene features in the genomes of S. Enteritidis strain PT4 and S. Typhimurium strain LT2. We explored the transcriptomic responses of both strains to two different chlorine treatments (130 ppm of chlorine for 30 min and 390 ppm of chlorine for 10 min) in brain heart infusion broth. We identified 209 S. enterica core genes associated with Fe-S cluster assembly, cysteine biosynthesis, stress response, ribosome formation, biofilm formation, and energy metabolism that were differentially expressed (>1.5-fold; P < 0.05). In addition, we found that serovars Enteriditis and Typhimurium differed in the responses of 33 stress-related genes and 19 virulence-associated genes to the chlorine stress. Findings from this study suggest that the oxidative-stress response may render S. enterica resistant or susceptible to certain types of environmental stresses, which in turn promotes the development of more effective hurdle interventions to reduce the risk of S. enterica contamination in the food supply.