Búsqueda | Portal Regional de la BVS

1.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Catacchio, Claudia R; Porubsky, David; Mao, Yafei; Yoo, DongAhn; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Ventura, Mario; Alexandrov, Ivan A; Eichler, Evan E.

Nature ; 629(8010): 136-145, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38570684

RESUMEN

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

Asunto(s)

Centrómero , Evolución Molecular , Variación Genética , Animales , Humanos , Centrómero/genética , Centrómero/metabolismo , Proteína A Centromérica/metabolismo , Metilación de ADN/genética , ADN Satélite/genética , Cinetocoros/metabolismo , Macaca/genética , Pan troglodytes/genética , Polimorfismo de Nucleótido Simple/genética , Pongo/genética , Masculino , Femenino , Estándares de Referencia , Inmunoprecipitación de Cromatina , Haplotipos , Mutación , Amplificación de Genes , Alineación de Secuencia , Cromatina/genética , Cromatina/metabolismo , Especificidad de la Especie

2.

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; McGrath, Barbara C; Antonacci, Francesca; Aubel, Margaux; Biddanda, Arjun; Borchers, Matthew; Bomberg, Erich; Bouffard, Gerard G; Brooks, Shelise Y; Carbone, Lucia; Carrel, Laura; Carroll, Andrew; Chang, Pi-Chuan; Chin, Chen-Shan; Cook, Daniel E; Craig, Sarah J C; de Gennaro, Luciana; Diekhans, Mark; Dutra, Amalia; Garcia, Gage H; Grady, Patrick G S; Green, Richard E; Haddad, Diana; Hallast, Pille; Harvey, William T; Hickey, Glenn; Hillis, David A; Hoyt, Savannah J; Jeong, Hyeonsoo; Kamali, Kaivan; Kosakovsky Pond, Sergei L; LaPolice, Troy M; Lee, Charles; Lewis, Alexandra P; Loh, Yong-Hwee E; Masterson, Patrick; McCoy, Rajiv C; Medvedev, Paul; Miga, Karen H; Munson, Katherine M; Pak, Evgenia.

bioRxiv ; 2023 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-38077089

RESUMEN

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

3.

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37612512

RESUMEN

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Asunto(s)

Cromosomas Humanos Y , Genómica , Análisis de Secuencia de ADN , Humanos , Secuencia de Bases , Cromosomas Humanos Y/genética , ADN Satélite/genética , Variación Genética/genética , Genética de Población , Genómica/métodos , Genómica/normas , Heterocromatina/genética , Familia de Multigenes/genética , Estándares de Referencia , Duplicaciones Segmentarias en el Genoma/genética , Análisis de Secuencia de ADN/normas , Secuencias Repetidas en Tándem/genética , Telómero/genética

4.

The variation and evolution of complete human centromeres.

Logsdon, Glennis A; Rozanski, Allison N; Ryabov, Fedor; Potapova, Tamara; Shepelev, Valery A; Mao, Yafei; Rautiainen, Mikko; Koren, Sergey; Nurk, Sergey; Porubsky, David; Lucas, Julian K; Hoekzema, Kendra; Munson, Katherine M; Gerton, Jennifer L; Phillippy, Adam M; Alexandrov, Ivan A; Eichler, Evan E.

bioRxiv ; 2023 May 30.

Artículo en Inglés | MEDLINE | ID: mdl-37398417

RESUMEN

We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

5.

Telomere-to-telomere assembly of diploid chromosomes with Verkko.

Rautiainen, Mikko; Nurk, Sergey; Walenz, Brian P; Logsdon, Glennis A; Porubsky, David; Rhie, Arang; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Nat Biotechnol ; 41(10): 1474-1482, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-36797493

RESUMEN

The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.

Asunto(s)

Diploidia , Genómica , Humanos , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Genoma Humano/genética , Telómero/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos

6.

Complete genomic and epigenetic maps of human centromeres.

Altemose, Nicolas; Logsdon, Glennis A; Bzikadze, Andrey V; Sidhwani, Pragya; Langley, Sasha A; Caldas, Gina V; Hoyt, Savannah J; Uralsky, Lev; Ryabov, Fedor D; Shew, Colin J; Sauria, Michael E G; Borchers, Matthew; Gershman, Ariel; Mikheenko, Alla; Shepelev, Valery A; Dvorkina, Tatiana; Kunyavskaya, Olga; Vollger, Mitchell R; Rhie, Arang; McCartney, Ann M; Asri, Mobin; Lorig-Roach, Ryan; Shafin, Kishwar; Lucas, Julian K; Aganezov, Sergey; Olson, Daniel; de Lima, Leonardo Gomes; Potapova, Tamara; Hartley, Gabrielle A; Haukness, Marina; Kerpedjiev, Peter; Gusev, Fedor; Tigyi, Kristof; Brooks, Shelise; Young, Alice; Nurk, Sergey; Koren, Sergey; Salama, Sofie R; Paten, Benedict; Rogaev, Evgeny I; Streets, Aaron; Karpen, Gary H; Dernburg, Abby F; Sullivan, Beth A; Straight, Aaron F; Wheeler, Travis J; Gerton, Jennifer L; Eichler, Evan E; Phillippy, Adam M; Timp, Winston.

Science ; 376(6588): eabl4178, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35357911

RESUMEN

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

Asunto(s)

Centrómero/genética , Mapeo Cromosómico , Epigénesis Genética , Genoma Humano , Evolución Molecular , Genómica , Humanos , Secuencias Repetitivas de Ácidos Nucleicos

7.

Segmental duplications and their variation in a complete human genome.

Vollger, Mitchell R; Guitart, Xavi; Dishuck, Philip C; Mercuri, Ludovica; Harvey, William T; Gershman, Ariel; Diekhans, Mark; Sulovari, Arvis; Munson, Katherine M; Lewis, Alexandra P; Hoekzema, Kendra; Porubsky, David; Li, Ruiyang; Nurk, Sergey; Koren, Sergey; Miga, Karen H; Phillippy, Adam M; Timp, Winston; Ventura, Mario; Eichler, Evan E.

Science ; 376(6588): eabj6965, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35357917

RESUMEN

Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.

Asunto(s)

Variaciones en el Número de Copia de ADN , Duplicación de Gen , Genoma Humano , Duplicaciones Segmentarias en el Genoma , Evolución Molecular , Proteínas Activadoras de GTPasa/genética , Humanos , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas/genética

8.

The complete sequence of a human genome.

Nurk, Sergey; Koren, Sergey; Rhie, Arang; Rautiainen, Mikko; Bzikadze, Andrey V; Mikheenko, Alla; Vollger, Mitchell R; Altemose, Nicolas; Uralsky, Lev; Gershman, Ariel; Aganezov, Sergey; Hoyt, Savannah J; Diekhans, Mark; Logsdon, Glennis A; Alonge, Michael; Antonarakis, Stylianos E; Borchers, Matthew; Bouffard, Gerard G; Brooks, Shelise Y; Caldas, Gina V; Chen, Nae-Chyun; Cheng, Haoyu; Chin, Chen-Shan; Chow, William; de Lima, Leonardo G; Dishuck, Philip C; Durbin, Richard; Dvorkina, Tatiana; Fiddes, Ian T; Formenti, Giulio; Fulton, Robert S; Fungtammasan, Arkarachai; Garrison, Erik; Grady, Patrick G S; Graves-Lindsay, Tina A; Hall, Ira M; Hansen, Nancy F; Hartley, Gabrielle A; Haukness, Marina; Howe, Kerstin; Hunkapiller, Michael W; Jain, Chirag; Jain, Miten; Jarvis, Erich D; Kerpedjiev, Peter; Kirsche, Melanie; Kolmogorov, Mikhail; Korlach, Jonas; Kremitzki, Milinn; Li, Heng.

Science ; 376(6588): 44-53, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35357919

RESUMEN

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Asunto(s)

Genoma Humano , Proyecto Genoma Humano , Análisis de Secuencia de ADN/normas , Línea Celular , Cromosomas Artificiales Bacterianos/genética , Cromosomas Humanos/genética , Humanos , Valores de Referencia

9.

STRONG: metagenomics strain resolution on assembly graphs.

Quince, Christopher; Nurk, Sergey; Raguideau, Sebastien; James, Robert; Soyer, Orkun S; Summers, J Kimberly; Limasset, Antoine; Eren, A Murat; Chikhi, Rayan; Darling, Aaron E.

Genome Biol ; 22(1): 214, 2021 07 26.

Artículo en Inglés | MEDLINE | ID: mdl-34311761

RESUMEN

We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.

Asunto(s)

Algoritmos , Genoma Bacteriano , Metagenoma , Consorcios Microbianos/genética , Programas Informáticos , Teorema de Bayes , Mapeo Contig , Haplotipos , Metagenómica/métodos , Análisis de Secuencia de ADN

10.

The structure, function and evolution of a complete human chromosome 8.

Logsdon, Glennis A; Vollger, Mitchell R; Hsieh, PingHsun; Mao, Yafei; Liskovykh, Mikhail A; Koren, Sergey; Nurk, Sergey; Mercuri, Ludovica; Dishuck, Philip C; Rhie, Arang; de Lima, Leonardo G; Dvorkina, Tatiana; Porubsky, David; Harvey, William T; Mikheenko, Alla; Bzikadze, Andrey V; Kremitzki, Milinn; Graves-Lindsay, Tina A; Jain, Chirag; Hoekzema, Kendra; Murali, Shwetha C; Munson, Katherine M; Baker, Carl; Sorensen, Melanie; Lewis, Alexandra M; Surti, Urvashi; Gerton, Jennifer L; Larionov, Vladimir; Ventura, Mario; Miga, Karen H; Phillippy, Adam M; Eichler, Evan E.

Nature ; 593(7857): 101-107, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33828295

RESUMEN

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.

Asunto(s)

Cromosomas Humanos Par 8/química , Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Línea Celular , Centrómero/química , Centrómero/genética , Centrómero/metabolismo , Cromosomas Humanos Par 8/fisiología , Metilación de ADN , ADN Satélite/genética , Epigénesis Genética , Femenino , Humanos , Macaca mulatta/genética , Masculino , Repeticiones de Minisatélite/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telómero/química , Telómero/genética , Telómero/metabolismo

11.

A Multi-Omics Characterization of the Natural Product Potential of Tropical Filamentous Marine Cyanobacteria.

Leão, Tiago; Wang, Mingxun; Moss, Nathan; da Silva, Ricardo; Sanders, Jon; Nurk, Sergey; Gurevich, Alexey; Humphrey, Gregory; Reher, Raphael; Zhu, Qiyun; Belda-Ferre, Pedro; Glukhov, Evgenia; Whitner, Syrena; Alexander, Kelsey L; Rex, Robert; Pevzner, Pavel; Dorrestein, Pieter C; Knight, Rob; Bandeira, Nuno; Gerwick, William H; Gerwick, Lena.

Mar Drugs ; 19(1)2021 Jan 06.

Artículo en Inglés | MEDLINE | ID: mdl-33418911

RESUMEN

Microbial natural products are important for the understanding of microbial interactions, chemical defense and communication, and have also served as an inspirational source for numerous pharmaceutical drugs. Tropical marine cyanobacteria have been highlighted as a great source of new natural products, however, few reports have appeared wherein a multi-omics approach has been used to study their natural products potential (i.e., reports are often focused on an individual natural product and its biosynthesis). This study focuses on describing the natural product genetic potential as well as the expressed natural product molecules in benthic tropical cyanobacteria. We collected from several sites around the world and sequenced the genomes of 24 tropical filamentous marine cyanobacteria. The informatics program antiSMASH was used to annotate the major classes of gene clusters. BiG-SCAPE phylum-wide analysis revealed the most promising strains for natural product discovery among these cyanobacteria. LCMS/MS-based metabolomics highlighted the most abundant molecules and molecular classes among 10 of these marine cyanobacterial samples. We observed that despite many genes encoding for peptidic natural products, peptides were not as abundant as lipids and lipopeptides in the chemical extracts. Our results highlight a number of highly interesting biosynthetic gene clusters for genome mining among these cyanobacterial samples.

Asunto(s)

Productos Biológicos/farmacología , Cianobacterias/química , Cromatografía Líquida de Alta Presión , Cianobacterias/genética , Genoma Bacteriano , Genómica , Biología Marina , Espectrometría de Masas , Metabolómica , Familia de Multigenes , Filogenia , Clima Tropical

12.

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Nurk, Sergey; Walenz, Brian P; Rhie, Arang; Vollger, Mitchell R; Logsdon, Glennis A; Grothe, Robert; Miga, Karen H; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Genome Res ; 30(9): 1291-1305, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32801147

RESUMEN

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.

Asunto(s)

Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Alelos , Animales , Línea Celular , Duplicación Cromosómica , ADN de Neoplasias , ADN Satélite , Drosophila/genética , Genoma Humano , Haplotipos , Humanos , Reproducibilidad de los Resultados , Programas Informáticos

13.

SPAligner: alignment of long diverged molecular sequences to assembly graphs.

Dvorkina, Tatiana; Antipov, Dmitry; Korobeynikov, Anton; Nurk, Sergey.

BMC Bioinformatics ; 21(Suppl 12): 306, 2020 Jul 24.

Artículo en Inglés | MEDLINE | ID: mdl-32703258

RESUMEN

BACKGROUND: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. RESULTS: In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. CONCLUSIONS: Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.

Asunto(s)

Algoritmos , Variación Genética , Alineación de Secuencia , Secuencia de Bases , Haplotipos/genética , Humanos , Programas Informáticos , Estadística como Asunto , beta-Lactamasas/química

14.

Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.

Sanders, Jon G; Nurk, Sergey; Salido, Rodolfo A; Minich, Jeremiah; Xu, Zhenjiang Z; Zhu, Qiyun; Martino, Cameron; Fedarko, Marcus; Arthur, Timothy D; Chen, Feng; Boland, Brigid S; Humphrey, Greg C; Brennan, Caitriona; Sanders, Karenina; Gaffney, James; Jepsen, Kristen; Khosroheidari, Mahdieh; Green, Cliff; Liyanage, Marlon; Dang, Jason W; Phelan, Vanessa V; Quinn, Robert A; Bankevich, Anton; Chang, John T; Rana, Tariq M; Conrad, Douglas J; Sandborn, William J; Smarr, Larry; Dorrestein, Pieter C; Pevzner, Pavel A; Knight, Rob.

Genome Biol ; 20(1): 226, 2019 10 31.

Artículo en Inglés | MEDLINE | ID: mdl-31672156

RESUMEN

As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing.

Asunto(s)

Biblioteca Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenómica/métodos , Animales , Benchmarking , Microbioma Gastrointestinal , Humanos , Ratones

15.

Metagenomics-Based, Strain-Level Analysis of Escherichia coli From a Time-Series of Microbiome Samples From a Crohn's Disease Patient.

Fang, Xin; Monk, Jonathan M; Nurk, Sergey; Akseshina, Margarita; Zhu, Qiyun; Gemmell, Christopher; Gianetto-Hill, Connor; Leung, Nelly; Szubin, Richard; Sanders, Jon; Beck, Paul L; Li, Weizhong; Sandborn, William J; Gray-Owen, Scott D; Knight, Rob; Allen-Vercoe, Emma; Palsson, Bernhard O; Smarr, Larry.

Front Microbiol ; 9: 2559, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30425690

RESUMEN

Dysbiosis of the gut microbiome, including elevated abundance of putative leading bacterial triggers such as E. coli in inflammatory bowel disease (IBD) patients, is of great interest. To date, most E. coli studies in IBD patients are focused on clinical isolates, overlooking their relative abundances and turnover over time. Metagenomics-based studies, on the other hand, are less focused on strain-level investigations. Here, using recently developed bioinformatic tools, we analyzed the abundance and properties of specific E. coli strains in a Crohns disease (CD) patient longitudinally, while also considering the composition of the entire community over time. In this report, we conducted a pilot study on metagenomic-based, strain-level analysis of a time-series of E. coli strains in a left-sided CD patient, who exhibited sustained levels of E. coli greater than 100X healthy controls. We: (1) mapped out the composition of the gut microbiome over time, particularly the presence of E. coli strains, and found that the abundance and dominance of specific E. coli strains in the community varied over time; (2) performed strain-level de novo assemblies of seven dominant E. coli strains, and illustrated disparity between these strains in both phylogenetic origin and genomic content; (3) observed that strain ST1 (recovered during peak inflammation) is highly similar to known pathogenic AIEC strains NC101 and LF82 in both virulence factors and metabolic functions, while other strains (ST2-ST7) that were collected during more stable states displayed diverse characteristics; (4) isolated, sequenced, experimentally characterized ST1, and confirmed the accuracy of the de novo assembly; and (5) assessed growth capability of ST1 with a newly reconstructed genome-scale metabolic model of the strain, and showed its potential to use substrates found abundantly in the human gut to outcompete other microbes. In conclusion, inflammation status (assessed by the blood C-reactive protein and stool calprotectin) is likely correlated with the abundance of a subgroup of E. coli strains with specific traits. Therefore, strain-level time-series analysis of dominant E. coli strains in a CD patient is highly informative, and motivates a study of a larger cohort of IBD patients.

16.

metaSPAdes: a new versatile metagenomic assembler.

Nurk, Sergey; Meleshko, Dmitry; Korobeynikov, Anton; Pevzner, Pavel A.

Genome Res ; 27(5): 824-834, 2017 05.

Artículo en Inglés | MEDLINE | ID: mdl-28298430

RESUMEN

While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.

Asunto(s)

Mapeo Contig/métodos , Genómica/métodos , Metagenoma , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma Bacteriano

17.

Model-driven discovery of underground metabolic functions in Escherichia coli.

Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M.

Proc Natl Acad Sci U S A ; 112(3): 929-34, 2015 Jan 20.

Artículo en Inglés | MEDLINE | ID: mdl-25564669

RESUMEN

Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.

Asunto(s)

Escherichia coli/metabolismo , Modelos Biológicos , Escherichia coli/genética , Escherichia coli/crecimiento & desarrollo , Genoma Bacteriano

18.

ExSPAnder: a universal repeat resolver for DNA fragment assembly.

Prjibelski, Andrey D; Vasilinetc, Irina; Bankevich, Anton; Gurevich, Alexey; Krivosheeva, Tatiana; Nurk, Sergey; Pham, Son; Korobeynikov, Anton; Lapidus, Alla; Pevzner, Pavel A.

Bioinformatics ; 30(12): i293-301, 2014 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-24931996

RESUMEN

UNLABELLED: Next-generation sequencing (NGS) technologies have raised a challenging de novo genome assembly problem that is further amplified in recently emerged single-cell sequencing projects. While various NGS assemblers can use information from several libraries of read-pairs, most of them were originally developed for a single library and do not fully benefit from multiple libraries. Moreover, most assemblers assume uniform read coverage, condition that does not hold for single-cell projects where utilization of read-pairs is even more challenging. We have developed an exSPAnder algorithm that accurately resolves repeats in the case of both single and multiple libraries of read-pairs in both standard and single-cell assembly projects. AVAILABILITY AND IMPLEMENTATION: http://bioinf.spbau.ru/en/spades

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Actinomycetales/genética , ADN/química , Biblioteca de Genes , Genoma Bacteriano , Humanos , Secuencias Repetitivas de Ácidos Nucleicos , Staphylococcus aureus/genética

19.

What is the difference between the breakpoint graph and the de Bruijn graph?

Lin, Yu; Nurk, Sergey; Pevzner, Pavel A.

BMC Genomics ; 15 Suppl 6: S6, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25572416

RESUMEN

The breakpoint graph and the de Bruijn graph are two key data structures in the studies of genome rearrangements and genome assembly. However, the classical breakpoint graphs are defined on two genomes (represented as sequences of synteny blocks), while the classical de Bruijn graphs are defined on a single genome (represented as DNA strings). Thus, the connection between these two graph models is not explicit. We generalize the notions of both the breakpoint graph and the de Bruijn graph, and make it transparent that the breakpoint graph and the de Bruijn graph are mathematically equivalent. The explicit description of the connection between these important data structures provides a bridge between two previously separated bioinformatics communities studying genome rearrangements and genome assembly.

Asunto(s)

Puntos de Rotura del Cromosoma , Genoma , Genómica/métodos , Modelos Genéticos , Recombinación Genética , Algoritmos , Animales , Humanos

20.

Assembling single-cell genomes and mini-metagenomes from chimeric MDA products.

Nurk, Sergey; Bankevich, Anton; Antipov, Dmitry; Gurevich, Alexey A; Korobeynikov, Anton; Lapidus, Alla; Prjibelski, Andrey D; Pyshkin, Alexey; Sirotkin, Alexander; Sirotkin, Yakov; Stepanauskas, Ramunas; Clingenpeel, Scott R; Woyke, Tanja; McLean, Jeffrey S; Lasken, Roger; Tesler, Glenn; Alekseyev, Max A; Pevzner, Pavel A.

J Comput Biol ; 20(10): 714-37, 2013 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24093227

RESUMEN

Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.

Asunto(s)

Mapeo Contig/métodos , ADN Bacteriano/genética , ADN Concatenado/genética , Algoritmos , Composición de Base , Biología Computacional , Escherichia coli/genética , Biblioteca de Genes , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Técnicas de Amplificación de Ácido Nucleico , Pedobacter/genética , Prochlorococcus/genética , Análisis de Secuencia de ADN , Análisis de la Célula Individual

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA