Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Catacchio, Claudia R; Porubsky, David; Mao, Yafei; Yoo, DongAhn; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Ventura, Mario; Alexandrov, Ivan A; Eichler, Evan E.

Nature ; 629(8010): 136-145, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38570684

RESUMEN

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

Asunto(s)

Centrómero , Evolución Molecular , Variación Genética , Animales , Humanos , Centrómero/genética , Centrómero/metabolismo , Proteína A Centromérica/metabolismo , Metilación de ADN/genética , ADN Satélite/genética , Cinetocoros/metabolismo , Macaca/genética , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple/genética , Pongo/genética , Masculino , Femenino , Estándares de Referencia , Inmunoprecipitación de Cromatina , Haplotipos , Mutación , Amplificación de Genes , Alineación de Secuencia , Cromatina/genética , Cromatina/metabolismo , Especificidad de la Especie

2.

Recombination between heterologous human acrocentric chromosomes.

Guarracino, Andrea; Buonaiuto, Silvia; de Lima, Leonardo Gomes; Potapova, Tamara; Rhie, Arang; Koren, Sergey; Rubinstein, Boris; Fischer, Christian; Gerton, Jennifer L; Phillippy, Adam M; Colonna, Vincenza; Garrison, Erik.

Nature ; 617(7960): 335-343, 2023 05.

Artículo en Inglés | MEDLINE | ID: mdl-37165241

RESUMEN

The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications1,2. Although the resolution of these regions in the first complete assembly of a human genome-the Telomere-to-Telomere Consortium's CHM13 assembly (T2T-CHM13)-provided a model of their homology3, it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination6,7. The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations8, and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago9.

Asunto(s)

Centrómero , Cromosomas Humanos , Recombinación Genética , Humanos , Centrómero/genética , Cromosomas Humanos/genética , ADN Ribosómico/genética , Recombinación Genética/genética , Translocación Genética/genética , Citogenética , Telómero/genética

3.

The Human Pangenome Project: a global resource to map genomic diversity.

Wang, Ting; Antonacci-Fulton, Lucinda; Howe, Kerstin; Lawson, Heather A; Lucas, Julian K; Phillippy, Adam M; Popejoy, Alice B; Asri, Mobin; Carson, Caryn; Chaisson, Mark J P; Chang, Xian; Cook-Deegan, Robert; Felsenfeld, Adam L; Fulton, Robert S; Garrison, Erik P; Garrison, Nanibaa' A; Graves-Lindsay, Tina A; Ji, Hanlee; Kenny, Eimear E; Koenig, Barbara A; Li, Daofeng; Marschall, Tobias; McMichael, Joshua F; Novak, Adam M; Purushotham, Deepak; Schneider, Valerie A; Schultz, Baergen I; Smith, Michael W; Sofia, Heidi J; Weissman, Tsachy; Flicek, Paul; Li, Heng; Miga, Karen H; Paten, Benedict; Jarvis, Erich D; Hall, Ira M; Eichler, Evan E; Haussler, David.

Nature ; 604(7906): 437-446, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35444317

RESUMEN

The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.

Asunto(s)

Genoma Humano , Genómica , Genoma Humano/genética , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN

4.

The genome of the colonial hydroid Hydractinia reveals that their stem cells use a toolkit of evolutionarily shared genes with all animals.

Schnitzler, Christine E; Chang, E Sally; Waletich, Justin; Quiroga-Artigas, Gonzalo; Wong, Wai Yee; Nguyen, Anh-Dao; Barreira, Sofia N; Doonan, Liam B; Gonzalez, Paul; Koren, Sergey; Gahan, James M; Sanders, Steven M; Bradshaw, Brian; DuBuc, Timothy Q; de Jong, Danielle; Nawrocki, Eric P; Larson, Alexandra; Klasfeld, Samantha; Gornik, Sebastian G; Moreland, R Travis; Wolfsberg, Tyra G; Phillippy, Adam M; Mullikin, James C; Simakov, Oleg; Cartwright, Paulyn; Nicotra, Matthew; Frank, Uri; Baxevanis, Andreas D.

Genome Res ; 34(3): 498-513, 2024 04 25.

Artículo en Inglés | MEDLINE | ID: mdl-38508693

RESUMEN

Hydractinia is a colonial marine hydroid that shows remarkable biological properties, including the capacity to regenerate its entire body throughout its lifetime, a process made possible by its adult migratory stem cells, known as i-cells. Here, we provide an in-depth characterization of the genomic structure and gene content of two Hydractinia species, Hydractinia symbiolongicarpus and Hydractinia echinata, placing them in a comparative evolutionary framework with other cnidarian genomes. We also generated and annotated a single-cell transcriptomic atlas for adult male H. symbiolongicarpus and identified cell-type markers for all major cell types, including key i-cell markers. Orthology analyses based on the markers revealed that Hydractinia's i-cells are highly enriched in genes that are widely shared amongst animals, a striking finding given that Hydractinia has a higher proportion of phylum-specific genes than any of the other 41 animals in our orthology analysis. These results indicate that Hydractinia's stem cells and early progenitor cells may use a toolkit shared with all animals, making it a promising model organism for future exploration of stem cell biology and regenerative medicine. The genomic and transcriptomic resources for Hydractinia presented here will enable further studies of their regenerative capacity, colonial morphology, and ability to distinguish self from nonself.

Asunto(s)

Genoma , Hidrozoos , Animales , Hidrozoos/genética , Evolución Molecular , Transcriptoma , Células Madre/metabolismo , Masculino , Filogenia , Análisis de la Célula Individual/métodos

5.

Improved sequence mapping using a complete reference genome and lift-over.

Chen, Nae-Chyun; Paulin, Luis F; Sedlazeck, Fritz J; Koren, Sergey; Phillippy, Adam M; Langmead, Ben.

Nat Methods ; 21(1): 41-49, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38036856

RESUMEN

Complete, telomere-to-telomere (T2T) genome assemblies promise improved analyses and the discovery of new variants, but many essential genomic resources remain associated with older reference genomes. Thus, there is a need to translate genomic features and read alignments between references. Here we describe a method called levioSAM2 that performs fast and accurate lift-over between assemblies using a whole-genome map. In addition to enabling the use of several references, we demonstrate that aligning reads to a high-quality reference (for example, T2T-CHM13) and lifting to an older reference (for example, Genome reference Consortium (GRC)h38) improves the accuracy of the resulting variant calls on the old reference. By leveraging the quality improvements of T2T-CHM13, levioSAM2 reduces small and structural variant calling errors compared with GRC-based mapping using real short- and long-read datasets. Performance is especially improved for a set of complex medically relevant genes, where the GRC references are lower quality.

Asunto(s)

Genoma , Genómica , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Mapeo Cromosómico , Secuenciación de Nucleótidos de Alto Rendimiento

6.

The structure, function and evolution of a complete human chromosome 8.

Logsdon, Glennis A; Vollger, Mitchell R; Hsieh, PingHsun; Mao, Yafei; Liskovykh, Mikhail A; Koren, Sergey; Nurk, Sergey; Mercuri, Ludovica; Dishuck, Philip C; Rhie, Arang; de Lima, Leonardo G; Dvorkina, Tatiana; Porubsky, David; Harvey, William T; Mikheenko, Alla; Bzikadze, Andrey V; Kremitzki, Milinn; Graves-Lindsay, Tina A; Jain, Chirag; Hoekzema, Kendra; Murali, Shwetha C; Munson, Katherine M; Baker, Carl; Sorensen, Melanie; Lewis, Alexandra M; Surti, Urvashi; Gerton, Jennifer L; Larionov, Vladimir; Ventura, Mario; Miga, Karen H; Phillippy, Adam M; Eichler, Evan E.

Nature ; 593(7857): 101-107, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33828295

RESUMEN

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.

Asunto(s)

Cromosomas Humanos Par 8/química , Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Línea Celular , Centrómero/química , Centrómero/genética , Centrómero/metabolismo , Cromosomas Humanos Par 8/fisiología , Metilación de ADN , ADN Satélite/genética , Epigénesis Genética , Femenino , Humanos , Macaca mulatta/genética , Masculino , Repeticiones de Minisatélite/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telómero/química , Telómero/genética , Telómero/metabolismo

7.

Evolutionary and biomedical insights from a marmoset diploid genome assembly.

Yang, Chentao; Zhou, Yang; Marcus, Stephanie; Formenti, Giulio; Bergeron, Lucie A; Song, Zhenzhen; Bi, Xupeng; Bergman, Juraj; Rousselle, Marjolaine Marie C; Zhou, Chengran; Zhou, Long; Deng, Yuan; Fang, Miaoquan; Xie, Duo; Zhu, Yuanzhen; Tan, Shangjin; Mountcastle, Jacquelyn; Haase, Bettina; Balacco, Jennifer; Wood, Jonathan; Chow, William; Rhie, Arang; Pippel, Martin; Fabiszak, Margaret M; Koren, Sergey; Fedrigo, Olivier; Freiwald, Winrich A; Howe, Kerstin; Yang, Huanming; Phillippy, Adam M; Schierup, Mikkel Heide; Jarvis, Erich D; Zhang, Guojie.

Nature ; 594(7862): 227-233, 2021 06.

Artículo en Inglés | MEDLINE | ID: mdl-33910227

RESUMEN

The accurate and complete assembly of both haplotype sequences of a diploid organism is essential to understanding the role of variation in genome functions, phenotypes and diseases1. Here, using a trio-binning approach, we present a high-quality, diploid reference genome, with both haplotypes assembled independently at the chromosome level, for the common marmoset (Callithrix jacchus), an primate model system that is widely used in biomedical research2,3. The full spectrum of heterozygosity between the two haplotypes involves 1.36% of the genome-much higher than the 0.13% indicated by the standard estimation based on single-nucleotide heterozygosity alone. The de novo mutation rate is 0.43 × 10-8 per site per generation, and the paternal inherited genome acquired twice as many mutations as the maternal. Our diploid assembly enabled us to discover a recent expansion of the sex-differentiation region and unique evolutionary changes in the marmoset Y chromosome. In addition, we identified many genes with signatures of positive selection that might have contributed to the evolution of Callithrix biological features. Brain-related genes were highly conserved between marmosets and humans, although several genes experienced lineage-specific copy number variations or diversifying selection, with implications for the use of marmosets as a model system.

Asunto(s)

Callithrix/genética , Diploidia , Evolución Molecular , Genoma/genética , Genómica/normas , Animales , Investigación Biomédica , Variaciones en el Número de Copia de ADN , Femenino , Mutación de Línea Germinal/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL/genética , Masculino , Estándares de Referencia , Selección Genética , Diferenciación Sexual/genética , Cromosoma Y/genética

8.

Platypus and echidna genomes reveal mammalian biology and evolution.

Zhou, Yang; Shearwin-Whyatt, Linda; Li, Jing; Song, Zhenzhen; Hayakawa, Takashi; Stevens, David; Fenelon, Jane C; Peel, Emma; Cheng, Yuanyuan; Pajpach, Filip; Bradley, Natasha; Suzuki, Hikoyu; Nikaido, Masato; Damas, Joana; Daish, Tasman; Perry, Tahlia; Zhu, Zexian; Geng, Yuncong; Rhie, Arang; Sims, Ying; Wood, Jonathan; Haase, Bettina; Mountcastle, Jacquelyn; Fedrigo, Olivier; Li, Qiye; Yang, Huanming; Wang, Jian; Johnston, Stephen D; Phillippy, Adam M; Howe, Kerstin; Jarvis, Erich D; Ryder, Oliver A; Kaessmann, Henrik; Donnelly, Peter; Korlach, Jonas; Lewin, Harris A; Graves, Jennifer; Belov, Katherine; Renfree, Marilyn B; Grutzner, Frank; Zhou, Qi; Zhang, Guojie.

Nature ; 592(7856): 756-762, 2021 04.

Artículo en Inglés | MEDLINE | ID: mdl-33408411

RESUMEN

Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.

Asunto(s)

Evolución Biológica , Genoma , Ornitorrinco/genética , Tachyglossidae/genética , Animales , Femenino , Masculino , Mamíferos/genética , Filogenia , Cromosomas Sexuales/genética

9.

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.

Kolmogorov, Mikhail; Billingsley, Kimberley J; Mastoras, Mira; Meredith, Melissa; Monlong, Jean; Lorig-Roach, Ryan; Asri, Mobin; Alvarez Jerez, Pilar; Malik, Laksh; Dewan, Ramita; Reed, Xylena; Genner, Rylee M; Daida, Kensuke; Behera, Sairam; Shafin, Kishwar; Pesout, Trevor; Prabakaran, Jeshuwin; Carnevali, Paolo; Yang, Jianzhi; Rhie, Arang; Scholz, Sonja W; Traynor, Bryan J; Miga, Karen H; Jain, Miten; Timp, Winston; Phillippy, Adam M; Chaisson, Mark; Sedlazeck, Fritz J; Blauwendraat, Cornelis; Paten, Benedict.

Nat Methods ; 20(10): 1483-1492, 2023 10.

Artículo en Inglés | MEDLINE | ID: mdl-37710018

RESUMEN

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.

Asunto(s)

Genoma Humano , Secuenciación de Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Haplotipos , Metilación , Proyectos Piloto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos

10.

Strategic vision for improving human health at The Forefront of Genomics.

Green, Eric D; Gunter, Chris; Biesecker, Leslie G; Di Francesco, Valentina; Easter, Carla L; Feingold, Elise A; Felsenfeld, Adam L; Kaufman, David J; Ostrander, Elaine A; Pavan, William J; Phillippy, Adam M; Wise, Anastasia L; Dayal, Jyoti Gupta; Kish, Britny J; Mandich, Allison; Wellington, Christopher R; Wetterstrand, Kris A; Bates, Sarah A; Leja, Darryl; Vasquez, Susan; Gahl, William A; Graham, Bettie J; Kastner, Daniel L; Liu, Paul; Rodriguez, Laura Lyman; Solomon, Benjamin D; Bonham, Vence L; Brody, Lawrence C; Hutter, Carolyn M; Manolio, Teri A.

Nature ; 586(7831): 683-692, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-33116284

RESUMEN

Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward-that is, at 'The Forefront of Genomics'.

Asunto(s)

Investigación Biomédica/tendencias , Genoma Humano/genética , Genómica/tendencias , Salud Pública/normas , Investigación Biomédica Traslacional/tendencias , Investigación Biomédica/economía , COVID-19/genética , Genómica/economía , Humanos , National Human Genome Research Institute (U.S.)/economía , Cambio Social , Investigación Biomédica Traslacional/economía , Estados Unidos

11.

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography.

Bukhman, Yury V; Morin, Phillip A; Meyer, Susanne; Chu, Li-Fang; Jacobsen, Jeff K; Antosiewicz-Bourget, Jessica; Mamott, Daniel; Gonzales, Maylie; Argus, Cara; Bolin, Jennifer; Berres, Mark E; Fedrigo, Olivier; Steill, John; Swanson, Scott A; Jiang, Peng; Rhie, Arang; Formenti, Giulio; Phillippy, Adam M; Harris, Robert S; Wood, Jonathan M D; Howe, Kerstin; Kirilenko, Bogdan M; Munegowda, Chetan; Hiller, Michael; Jain, Aashish; Kihara, Daisuke; Johnston, J Spencer; Ionkov, Alexander; Raja, Kalpana; Toh, Huishi; Lang, Aimee; Wolf, Magnus; Jarvis, Erich D; Thomson, James A; Chaisson, Mark J P; Stewart, Ron.

Mol Biol Evol ; 41(3)2024 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-38376487

RESUMEN

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.

Asunto(s)

Balaenoptera , Neoplasias , Animales , Balaenoptera/genética , Duplicaciones Segmentarias en el Genoma , Genoma , Demografía , Neoplasias/genética

12.

Long-read mapping to repetitive reference sequences using Winnowmap2.

Jain, Chirag; Rhie, Arang; Hansen, Nancy F; Koren, Sergey; Phillippy, Adam M.

Nat Methods ; 19(6): 705-710, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35365778

RESUMEN

Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.

Asunto(s)

Genoma Humano , Secuencias Repetitivas de Ácidos Nucleicos , Alelos , Humanos , Secuencias Repetitivas de Ácidos Nucleicos/genética , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN , Secuencias Repetidas en Tándem

13.

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Formenti, Giulio; Rhie, Arang; Walenz, Brian P; Thibaud-Nissen, Françoise; Shafin, Kishwar; Koren, Sergey; Myers, Eugene W; Jarvis, Erich D; Phillippy, Adam M.

Nat Methods ; 19(6): 696-704, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35361932

RESUMEN

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Genoma , Genómica , Humanos , Análisis de Secuencia de ADN

14.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Mc Cartney, Ann M; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A; Miga, Karen H; Mikheenko, Alla; Paten, Benedict; Shumate, Alaina; Soto, Daniela C; Sovic, Ivan; Wood, Jonathan M D; Zook, Justin M; Phillippy, Adam M; Rhie, Arang.

Nat Methods ; 19(6): 687-695, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35361931

RESUMEN

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Femenino , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Embarazo , Análisis de Secuencia de ADN/métodos , Telómero/genética

15.

ModDotPlot-rapid and interactive visualization of tandem repeats.

Sweeten, Alexander P; Schatz, Michael C; Phillippy, Adam M.

Bioinformatics ; 40(8)2024 08 02.

Artículo en Inglés | MEDLINE | ID: mdl-39110522

RESUMEN

MOTIVATION: A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy. RESULTS: In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes. AVAILABILITY AND IMPLEMENTATION: ModDotPlot is available at https://github.com/marbl/ModDotPlot.

Asunto(s)

Arabidopsis , Programas Informáticos , Secuencias Repetidas en Tándem , Arabidopsis/genética , Secuencias Repetidas en Tándem/genética , Genoma de Planta , Interfaz Usuario-Computador , Genómica/métodos

16.

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets.

Kille, Bryce; Nute, Michael G; Huang, Victor; Kim, Eddie; Phillippy, Adam M; Treangen, Todd J.

Bioinformatics ; 40(5)2024 05 02.

Artículo en Inglés | MEDLINE | ID: mdl-38724243

RESUMEN

MOTIVATION: Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014. RESULTS: To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4× and reduce runtime by over 2×, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes. AVAILABILITY AND IMPLEMENTATION: Parsnp v2 is available at https://github.com/marbl/parsnp.

Asunto(s)

Genoma Bacteriano , Alineación de Secuencia , Programas Informáticos , Alineación de Secuencia/métodos , Genómica/métodos , Algoritmos

17.

Balancing openness with Indigenous data sovereignty: An opportunity to leave no one behind in the journey to sequence all of life.

Mc Cartney, Ann M; Anderson, Jane; Liggins, Libby; Hudson, Maui L; Anderson, Matthew Z; TeAika, Ben; Geary, Janis; Cook-Deegan, Robert; Patel, Hardip R; Phillippy, Adam M.

Proc Natl Acad Sci U S A ; 119(4)2022 01 25.

Artículo en Inglés | MEDLINE | ID: mdl-35042810

RESUMEN

The field of genomics has benefited greatly from its "openness" approach to data sharing. However, with the increasing volume of sequence information being created and stored and the growing number of international genomics efforts, the equity of openness is under question. The United Nations Convention of Biodiversity aims to develop and adopt a standard policy on access and benefit-sharing for sequence information across signatory parties. This standardization will have profound implications on genomics research, requiring a new definition of open data sharing. The redefinition of openness is not unwarranted, as its limitations have unintentionally introduced barriers of engagement to some, including Indigenous Peoples. This commentary provides an insight into the key challenges of openness faced by the researchers who aspire to protect and conserve global biodiversity, including Indigenous flora and fauna, and presents immediate, practical solutions that, if implemented, will equip the genomics community with both the diversity and inclusivity required to respectfully protect global biodiversity.

Asunto(s)

Pueblos Indígenas/genética , Difusión de la Información/ética , Biodiversidad , Genómica/métodos , Humanos , Pueblos Indígenas/psicología , Pueblos Indígenas/estadística & datos numéricos , Difusión de la Información/métodos , Grupos de Población/genética

18.

A family of unusual immunoglobulin superfamily genes in an invertebrate histocompatibility complex.

Huene, Aidan L; Sanders, Steven M; Ma, Zhiwei; Nguyen, Anh-Dao; Koren, Sergey; Michaca, Manuel H; Mullikin, James C; Phillippy, Adam M; Schnitzler, Christine E; Baxevanis, Andreas D; Nicotra, Matthew L.

Proc Natl Acad Sci U S A ; 119(40): e2207374119, 2022 10 04.

Artículo en Inglés | MEDLINE | ID: mdl-36161920

RESUMEN

Most colonial marine invertebrates are capable of allorecognition, the ability to distinguish between themselves and conspecifics. One long-standing question is whether invertebrate allorecognition genes are homologous to vertebrate histocompatibility genes. In the cnidarian Hydractinia symbiolongicarpus, allorecognition is controlled by at least two genes, Allorecognition 1 (Alr1) and Allorecognition 2 (Alr2), which encode highly polymorphic cell-surface proteins that serve as markers of self. Here, we show that Alr1 and Alr2 are part of a family of 41 Alr genes, all of which reside in a single genomic interval called the Allorecognition Complex (ARC). Using sensitive homology searches and highly accurate structural predictions, we demonstrate that the Alr proteins are members of the immunoglobulin superfamily (IgSF) with V-set and I-set Ig domains unlike any previously identified in animals. Specifically, their primary amino acid sequences lack many of the motifs considered diagnostic for V-set and I-set domains, yet they adopt secondary and tertiary structures nearly identical to canonical Ig domains. Thus, the V-set domain, which played a central role in the evolution of vertebrate adaptive immunity, was present in the last common ancestor of cnidarians and bilaterians. Unexpectedly, several Alr proteins also have immunoreceptor tyrosine-based activation motifs and immunoreceptor tyrosine-based inhibitory motifs in their cytoplasmic tails, suggesting they could participate in pathways homologous to those that regulate immunity in humans and flies. This work expands our definition of the IgSF with the addition of a family of unusual members, several of which play a role in invertebrate histocompatibility.

Asunto(s)

Hidrozoos , Inmunoglobulinas , Complejo Mayor de Histocompatibilidad , Animales , Hidrozoos/genética , Hidrozoos/inmunología , Inmunoglobulinas/química , Inmunoglobulinas/genética , Complejo Mayor de Histocompatibilidad/genética , Proteínas de la Membrana/química , Proteínas de la Membrana/genética , Dominios Proteicos , Tirosina/química , Tirosina/genética

19.

Standards recommendations for the Earth BioGenome Project.

Lawniczak, Mara K N; Durbin, Richard; Flicek, Paul; Lindblad-Toh, Kerstin; Wei, Xiaofeng; Archibald, John M; Baker, William J; Belov, Katherine; Blaxter, Mark L; Marques Bonet, Tomas; Childers, Anna K; Coddington, Jonathan A; Crandall, Keith A; Crawford, Andrew J; Davey, Robert P; Di Palma, Federica; Fang, Qi; Haerty, Wilfried; Hall, Neil; Hoff, Katharina J; Howe, Kerstin; Jarvis, Erich D; Johnson, Warren E; Johnson, Rebecca N; Kersey, Paul J; Liu, Xin; Lopez, Jose Victor; Myers, Eugene W; Pettersson, Olga Vinnere; Phillippy, Adam M; Poelchau, Monica F; Pruitt, Kim D; Rhie, Arang; Castilla-Rubio, Juan Carlos; Sahu, Sunil Kumar; Salmon, Nicholas A; Soltis, Pamela S; Swarbreck, David; Thibaud-Nissen, Françoise; Wang, Sibo; Wegrzyn, Jill L; Zhang, Guojie; Zhang, He; Lewin, Harris A; Richards, Stephen.

Proc Natl Acad Sci U S A ; 119(4)2022 01 25.

Artículo en Inglés | MEDLINE | ID: mdl-35042802

RESUMEN

A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met.

Asunto(s)

Secuencia de Bases/genética , Eucariontes/genética , Genómica/normas , Animales , Biodiversidad , Genómica/métodos , Humanos , Estándares de Referencia , Valores de Referencia , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas

20.

Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation.

Kille, Bryce; Garrison, Erik; Treangen, Todd J; Phillippy, Adam M.

Bioinformatics ; 39(9)2023 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-37603771

RESUMEN

MOTIVATION: The Jaccard similarity on k-mer sets has shown to be a convenient proxy for sequence identity. By avoiding expensive base-level alignments and comparing reduced sequence representations, tools such as MashMap can scale to massive numbers of pairwise comparisons while still providing useful similarity estimates. However, due to their reliance on minimizer winnowing, previous versions of MashMap were shown to be biased and inconsistent estimators of Jaccard similarity. This directly impacts downstream tools that rely on the accuracy of these estimates. RESULTS: To address this, we propose the minmer winnowing scheme, which generalizes the minimizer scheme by use of a rolling minhash with multiple sampled k-mers per window. We show both theoretically and empirically that minmers yield an unbiased estimator of local Jaccard similarity, and we implement this scheme in an updated version of MashMap. The minmer-based implementation is over 10 times faster than the minimizer-based version under the default ANI threshold, making it well-suited for large-scale comparative genomics applications. AVAILABILITY AND IMPLEMENTATION: MashMap3 is available at https://github.com/marbl/MashMap.

Asunto(s)

Biología Computacional , Genómica

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA