Pesquisa | Portal de Pesquisa da BVS

1.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Catacchio, Claudia R; Porubsky, David; Mao, Yafei; Yoo, DongAhn; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Ventura, Mario; Alexandrov, Ivan A; Eichler, Evan E.

Nature ; 629(8010): 136-145, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38570684

RESUMO

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

Assuntos

Centrômero , Evolução Molecular , Variação Genética , Animais , Humanos , Centrômero/genética , Centrômero/metabolismo , Proteína Centromérica A/metabolismo , Metilação de DNA/genética , DNA Satélite/genética , Cinetocoros/metabolismo , Macaca/genética , Pan troglodytes/genética , Polimorfismo de Nucleotídeo Único/genética , Pongo/genética , Masculino , Feminino , Padrões de Referência , Imunoprecipitação da Cromatina , Haplótipos , Mutação , Amplificação de Genes , Alinhamento de Sequência , Cromatina/genética , Cromatina/metabolismo , Especificidade da Espécie

2.

Recombination between heterologous human acrocentric chromosomes.

Guarracino, Andrea; Buonaiuto, Silvia; de Lima, Leonardo Gomes; Potapova, Tamara; Rhie, Arang; Koren, Sergey; Rubinstein, Boris; Fischer, Christian; Gerton, Jennifer L; Phillippy, Adam M; Colonna, Vincenza; Garrison, Erik.

Nature ; 617(7960): 335-343, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37165241

RESUMO

The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications1,2. Although the resolution of these regions in the first complete assembly of a human genome-the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13)-provided a model of their homology3, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination6,7. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations8, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago9.

Assuntos

Centrômero , Cromossomos Humanos , Recombinação Genética , Humanos , Centrômero/genética , Cromossomos Humanos/genética , DNA Ribossômico/genética , Recombinação Genética/genética , Translocação Genética/genética , Citogenética , Telômero/genética

3.

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Assuntos

Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética

4.

Semi-automated assembly of high-quality diploid human reference genomes.

Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.

Nature ; 611(7936): 519-531, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Assuntos

Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética

5.

Phased nanopore assembly with Shasta and modular graph phasing with GFAse.

Lorig-Roach, Ryan; Meredith, Melissa; Monlong, Jean; Jain, Miten; Olsen, Hugh E; McNulty, Brandy; Porubsky, David; Montague, Tessa G; Lucas, Julian K; Condon, Chris; Eizenga, Jordan M; Juul, Sissel; McKenzie, Sean K; Simmonds, Sara E; Park, Jimin; Asri, Mobin; Koren, Sergey; Eichler, Evan E; Axel, Richard; Martin, Bruce; Carnevali, Paolo; Miga, Karen H; Paten, Benedict.

Genome Res ; 34(3): 454-468, 2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38627094

RESUMO

Reference-free genome phasing is vital for understanding allele inheritance and the impact of single-molecule DNA variation on phenotypes. To achieve thorough phasing across homozygous or repetitive regions of the genome, long-read sequencing technologies are often used to perform phased de novo assembly. As a step toward reducing the cost and complexity of this type of analysis, we describe new methods for accurately phasing Oxford Nanopore Technologies (ONT) sequence data with the Shasta genome assembler and a modular tool for extending phasing to the chromosome scale called GFAse. We test using new variants of ONT PromethION sequencing, including those using proximity ligation, and show that newer, higher accuracy ONT reads substantially improve assembly quality.

Assuntos

Nanoporos , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento por Nanoporos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Genômica/métodos

6.

The genome of the colonial hydroid Hydractinia reveals that their stem cells use a toolkit of evolutionarily shared genes with all animals.

Schnitzler, Christine E; Chang, E Sally; Waletich, Justin; Quiroga-Artigas, Gonzalo; Wong, Wai Yee; Nguyen, Anh-Dao; Barreira, Sofia N; Doonan, Liam B; Gonzalez, Paul; Koren, Sergey; Gahan, James M; Sanders, Steven M; Bradshaw, Brian; DuBuc, Timothy Q; de Jong, Danielle; Nawrocki, Eric P; Larson, Alexandra; Klasfeld, Samantha; Gornik, Sebastian G; Moreland, R Travis; Wolfsberg, Tyra G; Phillippy, Adam M; Mullikin, James C; Simakov, Oleg; Cartwright, Paulyn; Nicotra, Matthew; Frank, Uri; Baxevanis, Andreas D.

Genome Res ; 34(3): 498-513, 2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38508693

RESUMO

Hydractinia is a colonial marine hydroid that shows remarkable biological properties, including the capacity to regenerate its entire body throughout its lifetime, a process made possible by its adult migratory stem cells, known as i-cells. Here, we provide an in-depth characterization of the genomic structure and gene content of two Hydractinia species, Hydractinia symbiolongicarpus and Hydractinia echinata, placing them in a comparative evolutionary framework with other cnidarian genomes. We also generated and annotated a single-cell transcriptomic atlas for adult male H. symbiolongicarpus and identified cell-type markers for all major cell types, including key i-cell markers. Orthology analyses based on the markers revealed that Hydractinia's i-cells are highly enriched in genes that are widely shared amongst animals, a striking finding given that Hydractinia has a higher proportion of phylum-specific genes than any of the other 41 animals in our orthology analysis. These results indicate that Hydractinia's stem cells and early progenitor cells may use a toolkit shared with all animals, making it a promising model organism for future exploration of stem cell biology and regenerative medicine. The genomic and transcriptomic resources for Hydractinia presented here will enable further studies of their regenerative capacity, colonial morphology, and ability to distinguish self from nonself.

Assuntos

Genoma , Hidrozoários , Animais , Hidrozoários/genética , Evolução Molecular , Transcriptoma , Células-Tronco/metabolismo , Masculino , Filogenia , Análise de Célula Única/métodos

7.

Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.

Cheng, Haoyu; Asri, Mobin; Lucas, Julian; Koren, Sergey; Li, Heng.

Nat Methods ; 21(6): 967-970, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38730258

RESUMO

Despite advances in long-read sequencing technologies, constructing a near telomere-to-telomere assembly is still computationally demanding. Here we present hifiasm (UL), an efficient de novo assembly algorithm combining multiple sequencing technologies to scale up population-wide near telomere-to-telomere assemblies. Applied to 22 human and two plant genomes, our algorithm produces better diploid assemblies at a cost of an order of magnitude lower than existing methods, and it also works with polyploid genomes.

Assuntos

Algoritmos , Diploide , Poliploidia , Telômero , Humanos , Telômero/genética , Genoma de Planta , Genoma Humano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

8.

Improved sequence mapping using a complete reference genome and lift-over.

Chen, Nae-Chyun; Paulin, Luis F; Sedlazeck, Fritz J; Koren, Sergey; Phillippy, Adam M; Langmead, Ben.

Nat Methods ; 21(1): 41-49, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38036856

RESUMO

Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.

Assuntos

Genoma , Genômica , Análise de Sequência de DNA/métodos , Genômica/métodos , Mapeamento Cromossômico , Sequenciamento de Nucleotídeos em Larga Escala

9.

The structure, function and evolution of a complete human chromosome 8.

Logsdon, Glennis A; Vollger, Mitchell R; Hsieh, PingHsun; Mao, Yafei; Liskovykh, Mikhail A; Koren, Sergey; Nurk, Sergey; Mercuri, Ludovica; Dishuck, Philip C; Rhie, Arang; de Lima, Leonardo G; Dvorkina, Tatiana; Porubsky, David; Harvey, William T; Mikheenko, Alla; Bzikadze, Andrey V; Kremitzki, Milinn; Graves-Lindsay, Tina A; Jain, Chirag; Hoekzema, Kendra; Murali, Shwetha C; Munson, Katherine M; Baker, Carl; Sorensen, Melanie; Lewis, Alexandra M; Surti, Urvashi; Gerton, Jennifer L; Larionov, Vladimir; Ventura, Mario; Miga, Karen H; Phillippy, Adam M; Eichler, Evan E.

Nature ; 593(7857): 101-107, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33828295

RESUMO

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.

Assuntos

Cromossomos Humanos Par 8/química , Cromossomos Humanos Par 8/genética , Evolução Molecular , Animais , Linhagem Celular , Centrômero/química , Centrômero/genética , Centrômero/metabolismo , Cromossomos Humanos Par 8/fisiologia , Metilação de DNA , DNA Satélite/genética , Epigênese Genética , Feminino , Humanos , Macaca mulatta/genética , Masculino , Repetições Minissatélites/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telômero/química , Telômero/genética , Telômero/metabolismo

10.

Evolutionary and biomedical insights from a marmoset diploid genome assembly.

Yang, Chentao; Zhou, Yang; Marcus, Stephanie; Formenti, Giulio; Bergeron, Lucie A; Song, Zhenzhen; Bi, Xupeng; Bergman, Juraj; Rousselle, Marjolaine Marie C; Zhou, Chengran; Zhou, Long; Deng, Yuan; Fang, Miaoquan; Xie, Duo; Zhu, Yuanzhen; Tan, Shangjin; Mountcastle, Jacquelyn; Haase, Bettina; Balacco, Jennifer; Wood, Jonathan; Chow, William; Rhie, Arang; Pippel, Martin; Fabiszak, Margaret M; Koren, Sergey; Fedrigo, Olivier; Freiwald, Winrich A; Howe, Kerstin; Yang, Huanming; Phillippy, Adam M; Schierup, Mikkel Heide; Jarvis, Erich D; Zhang, Guojie.

Nature ; 594(7862): 227-233, 2021 06.

Artigo em Inglês | MEDLINE | ID: mdl-33910227

RESUMO

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research2,3. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.

Assuntos

Callithrix/genética , Diploide , Evolução Molecular , Genoma/genética , Genômica/normas , Animais , Pesquisa Biomédica , Variações do Número de Cópias de DNA , Feminino , Mutação em Linhagem Germinativa/genética , Haplótipos/genética , Heterozigoto , Humanos , Mutação INDEL/genética , Masculino , Padrões de Referência , Seleção Genética , Diferenciação Sexual/genética , Cromossomo Y/genética

11.

Towards complete and error-free genome assemblies of all vertebrate species.

Rhie, Arang; McCarthy, Shane A; Fedrigo, Olivier; Damas, Joana; Formenti, Giulio; Koren, Sergey; Uliano-Silva, Marcela; Chow, William; Fungtammasan, Arkarachai; Kim, Juwan; Lee, Chul; Ko, Byung June; Chaisson, Mark; Gedman, Gregory L; Cantin, Lindsey J; Thibaud-Nissen, Francoise; Haggerty, Leanne; Bista, Iliana; Smith, Michelle; Haase, Bettina; Mountcastle, Jacquelyn; Winkler, Sylke; Paez, Sadye; Howard, Jason; Vernes, Sonja C; Lama, Tanya M; Grutzner, Frank; Warren, Wesley C; Balakrishnan, Christopher N; Burt, Dave; George, Julia M; Biegler, Matthew T; Iorns, David; Digby, Andrew; Eason, Daryl; Robertson, Bruce; Edwards, Taylor; Wilkinson, Mark; Turner, George; Meyer, Axel; Kautt, Andreas F; Franchini, Paolo; Detrich, H William; Svardal, Hannes; Wagner, Maximilian; Naylor, Gavin J P; Pippel, Martin; Malinsky, Milan; Mooney, Mark; Simbirsky, Maria.

Nature ; 592(7856): 737-746, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33911273

RESUMO

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Assuntos

Genoma , Genômica/métodos , Vertebrados/genética , Animais , Aves , Biblioteca Gênica , Tamanho do Genoma , Genoma Mitocondrial , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Alinhamento de Sequência , Análise de Sequência de DNA , Cromossomos Sexuais/genética

12.

Telomere-to-telomere assembly of a complete human X chromosome.

Miga, Karen H; Koren, Sergey; Rhie, Arang; Vollger, Mitchell R; Gershman, Ariel; Bzikadze, Andrey; Brooks, Shelise; Howe, Edmund; Porubsky, David; Logsdon, Glennis A; Schneider, Valerie A; Potapova, Tamara; Wood, Jonathan; Chow, William; Armstrong, Joel; Fredrickson, Jeanne; Pak, Evgenia; Tigyi, Kristof; Kremitzki, Milinn; Markovic, Christopher; Maduro, Valerie; Dutra, Amalia; Bouffard, Gerard G; Chang, Alexander M; Hansen, Nancy F; Wilfert, Amy B; Thibaud-Nissen, Françoise; Schmitt, Anthony D; Belton, Jon-Matthew; Selvaraj, Siddarth; Dennis, Megan Y; Soto, Daniela C; Sahasrabudhe, Ruta; Kaya, Gulhan; Quick, Josh; Loman, Nicholas J; Holmes, Nadine; Loose, Matthew; Surti, Urvashi; Risques, Rosa Ana; Graves Lindsay, Tina A; Fulton, Robert; Hall, Ira; Paten, Benedict; Howe, Kerstin; Timp, Winston; Young, Alice; Mullikin, James C; Pevzner, Pavel A; Gerton, Jennifer L.

Nature ; 585(7823): 79-84, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32663838

RESUMO

After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.

Assuntos

Cromossomos Humanos X/genética , Genoma Humano/genética , Telômero/genética , Centrômero/genética , Ilhas de CpG/genética , Metilação de DNA , DNA Satélite/genética , Feminino , Humanos , Mola Hidatiforme/genética , Masculino , Gravidez , Reprodutibilidade dos Testes , Testículo/metabolismo

13.

Long-read mapping to repetitive reference sequences using Winnowmap2.

Jain, Chirag; Rhie, Arang; Hansen, Nancy F; Koren, Sergey; Phillippy, Adam M.

Nat Methods ; 19(6): 705-710, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35365778

RESUMO

Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.

Assuntos

Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Alelos , Humanos , Sequências Repetitivas de Ácido Nucleico/genética , Duplicações Segmentares Genômicas , Análise de Sequência de DNA , Sequências de Repetição em Tandem

14.

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Formenti, Giulio; Rhie, Arang; Walenz, Brian P; Thibaud-Nissen, Françoise; Shafin, Kishwar; Koren, Sergey; Myers, Eugene W; Jarvis, Erich D; Phillippy, Adam M.

Nat Methods ; 19(6): 696-704, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35361932

RESUMO

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Genoma , Genômica , Humanos , Análise de Sequência de DNA

15.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Mc Cartney, Ann M; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A; Miga, Karen H; Mikheenko, Alla; Paten, Benedict; Shumate, Alaina; Soto, Daniela C; Sovic, Ivan; Wood, Jonathan M D; Zook, Justin M; Phillippy, Adam M; Rhie, Arang.

Nat Methods ; 19(6): 687-695, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35361931

RESUMO

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Feminino , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Gravidez , Análise de Sequência de DNA/métodos , Telômero/genética

16.

A family of unusual immunoglobulin superfamily genes in an invertebrate histocompatibility complex.

Huene, Aidan L; Sanders, Steven M; Ma, Zhiwei; Nguyen, Anh-Dao; Koren, Sergey; Michaca, Manuel H; Mullikin, James C; Phillippy, Adam M; Schnitzler, Christine E; Baxevanis, Andreas D; Nicotra, Matthew L.

Proc Natl Acad Sci U S A ; 119(40): e2207374119, 2022 10 04.

Artigo em Inglês | MEDLINE | ID: mdl-36161920

RESUMO

Most colonial marine invertebrates are capable of allorecognition, the ability to distinguish between themselves and conspecifics. One long-standing question is whether invertebrate allorecognition genes are homologous to vertebrate histocompatibility genes. In the cnidarian Hydractinia symbiolongicarpus, allorecognition is controlled by at least two genes, Allorecognition 1 (Alr1) and Allorecognition 2 (Alr2), which encode highly polymorphic cell-surface proteins that serve as markers of self. Here, we show that Alr1 and Alr2 are part of a family of 41 Alr genes, all of which reside in a single genomic interval called the Allorecognition Complex (ARC). Using sensitive homology searches and highly accurate structural predictions, we demonstrate that the Alr proteins are members of the immunoglobulin superfamily (IgSF) with V-set and I-set Ig domains unlike any previously identified in animals. Specifically, their primary amino acid sequences lack many of the motifs considered diagnostic for V-set and I-set domains, yet they adopt secondary and tertiary structures nearly identical to canonical Ig domains. Thus, the V-set domain, which played a central role in the evolution of vertebrate adaptive immunity, was present in the last common ancestor of cnidarians and bilaterians. Unexpectedly, several Alr proteins also have immunoreceptor tyrosine-based activation motifs and immunoreceptor tyrosine-based inhibitory motifs in their cytoplasmic tails, suggesting they could participate in pathways homologous to those that regulate immunity in humans and flies. This work expands our definition of the IgSF with the addition of a family of unusual members, several of which play a role in invertebrate histocompatibility.

Assuntos

Hidrozoários , Imunoglobulinas , Complexo Principal de Histocompatibilidade , Animais , Hidrozoários/genética , Hidrozoários/imunologia , Imunoglobulinas/química , Imunoglobulinas/genética , Complexo Principal de Histocompatibilidade/genética , Proteínas de Membrana/química , Proteínas de Membrana/genética , Domínios Proteicos , Tirosina/química , Tirosina/genética

17.

Improved reference genome of Aedes aegypti informs arbovirus vector control.

Matthews, Benjamin J; Dudchenko, Olga; Kingan, Sarah B; Koren, Sergey; Antoshechkin, Igor; Crawford, Jacob E; Glassford, William J; Herre, Margaret; Redmond, Seth N; Rose, Noah H; Weedall, Gareth D; Wu, Yang; Batra, Sanjit S; Brito-Sierra, Carlos A; Buckingham, Steven D; Campbell, Corey L; Chan, Saki; Cox, Eric; Evans, Benjamin R; Fansiri, Thanyalak; Filipovic, Igor; Fontaine, Albin; Gloria-Soria, Andrea; Hall, Richard; Joardar, Vinita S; Jones, Andrew K; Kay, Raissa G G; Kodali, Vamsi K; Lee, Joyce; Lycett, Gareth J; Mitchell, Sara N; Muehling, Jill; Murphy, Michael R; Omer, Arina D; Partridge, Frederick A; Peluso, Paul; Aiden, Aviva Presser; Ramasamy, Vidya; Rasic, Gordana; Roy, Sourav; Saavedra-Rodriguez, Karla; Sharan, Shruti; Sharma, Atashi; Smith, Melissa Laird; Turner, Joe; Weakley, Allison M; Zhao, Zhilei; Akbari, Omar S; Black, William C; Cao, Han.

Nature ; 563(7732): 501-507, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30429615

RESUMO

Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.

Assuntos

Aedes/genética , Infecções por Arbovirus/virologia , Arbovírus , Genoma de Inseto/genética , Genômica/normas , Controle de Insetos , Mosquitos Vetores/genética , Mosquitos Vetores/virologia , Aedes/virologia , Animais , Infecções por Arbovirus/transmissão , Arbovírus/isolamento & purificação , Variações do Número de Cópias de DNA/genética , Vírus da Dengue/isolamento & purificação , Feminino , Variação Genética/genética , Genética Populacional , Glutationa Transferase/genética , Resistência a Inseticidas/efeitos dos fármacos , Masculino , Anotação de Sequência Molecular , Família Multigênica/genética , Piretrinas/farmacologia , Padrões de Referência , Processos de Determinação Sexual/genética

18.

Reference genomes of channel catfish and blue catfish reveal multiple pericentric chromosome inversions.

Waldbieser, Geoffrey C; Liu, Shikai; Yuan, Zihao; Older, Caitlin E; Gao, Dongya; Shi, Chenyu; Bosworth, Brian G; Li, Ning; Bao, Lisui; Kirby, Mona A; Jin, Yulin; Wood, Monica L; Scheffler, Brian; Simpson, Sheron; Youngblood, Ramey C; Duke, Mary V; Ballard, Linda; Phillippy, Adam; Koren, Sergey; Liu, Zhanjiang.

BMC Biol ; 21(1): 67, 2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-37013528

RESUMO

BACKGROUND: Channel catfish and blue catfish are the most important aquacultured species in the USA. The species do not readily intermate naturally but F1 hybrids can be produced through artificial spawning. F1 hybrids produced by mating channel catfish female with blue catfish male exhibit heterosis and provide an ideal system to study reproductive isolation and hybrid vigor. The purpose of the study was to generate high-quality chromosome level reference genome sequences and to determine their genomic similarities and differences. RESULTS: We present high-quality reference genome sequences for both channel catfish and blue catfish, containing only 67 and 139 total gaps, respectively. We also report three pericentric chromosome inversions between the two genomes, as evidenced by long reads across the inversion junctions from distinct individuals, genetic linkage mapping, and PCR amplicons across the inversion junctions. Recombination rates within the inversional segments, detected as double crossovers, are extremely low among backcross progenies (progenies of channel catfish female × F1 hybrid male), suggesting that the pericentric inversions interrupt postzygotic recombination or survival of recombinants. Identification of channel catfish- and blue catfish-specific genes, along with expansions of immunoglobulin genes and centromeric Xba elements, provides insights into genomic hallmarks of these species. CONCLUSIONS: We generated high-quality reference genome sequences for both blue catfish and channel catfish and identified major chromosomal inversions on chromosomes 6, 11, and 24. These perimetric inversions were validated by additional sequencing analysis, genetic linkage mapping, and PCR analysis across the inversion junctions. The reference genome sequences, as well as the contrasted chromosomal architecture should provide guidance for the interspecific breeding programs.

Assuntos

Ictaluridae , Humanos , Animais , Masculino , Feminino , Ictaluridae/genética , Inversão Cromossômica , Ligação Genética , Genoma , Mapeamento Cromossômico

19.

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Nurk, Sergey; Walenz, Brian P; Rhie, Arang; Vollger, Mitchell R; Logsdon, Glennis A; Grothe, Robert; Miga, Karen H; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Genome Res ; 30(9): 1291-1305, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32801147

RESUMO

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.

Assuntos

Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Alelos , Animais , Linhagem Celular , Duplicação Cromossômica , DNA de Neoplasias , DNA Satélite , Drosophila/genética , Genoma Humano , Haplótipos , Humanos , Reprodutibilidade dos Testes , Software

20.

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Olson, Nathan D; Treangen, Todd J; Hill, Christopher M; Cepeda-Espinoza, Victoria; Ghurye, Jay; Koren, Sergey; Pop, Mihai.

Brief Bioinform ; 20(4): 1140-1150, 2019 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-28968737

RESUMO

Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.

Assuntos

Metagenoma , Metagenômica/métodos , Microbiota/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Metagenômica/estatística & dados numéricos , Metagenômica/tendências , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA