Búsqueda | Portal Regional de la BVS

1.

Telomere-to-telomere assembly of diploid chromosomes with Verkko.

Rautiainen, Mikko; Nurk, Sergey; Walenz, Brian P; Logsdon, Glennis A; Porubsky, David; Rhie, Arang; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Nat Biotechnol ; 41(10): 1474-1482, 2023 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-36797493

RESUMEN

The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.

Asunto(s)

Diploidia , Genómica , Humanos , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Genoma Humano/genética , Telómero/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos

2.

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Formenti, Giulio; Rhie, Arang; Walenz, Brian P; Thibaud-Nissen, Françoise; Shafin, Kishwar; Koren, Sergey; Myers, Eugene W; Jarvis, Erich D; Phillippy, Adam M.

Nat Methods ; 19(6): 696-704, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35361932

RESUMEN

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Nanoporos , Genoma , Genómica , Humanos , Análisis de Secuencia de ADN

3.

The genomic structure of a human chromosome 22 nucleolar organizer region determined by TAR cloning.

Kim, Jung-Hyun; Noskov, Vladimir N; Ogurtsov, Aleksey Y; Nagaraja, Ramaiah; Petrov, Nikolai; Liskovykh, Mikhail; Walenz, Brian P; Lee, Hee-Sheung; Kouprina, Natalay; Phillippy, Adam M; Shabalina, Svetlana A; Schlessinger, David; Larionov, Vladimir.

Sci Rep ; 11(1): 2997, 2021 02 04.

Artículo en Inglés | MEDLINE | ID: mdl-33542373

RESUMEN

The rDNA clusters and flanking sequences on human chromosomes 13, 14, 15, 21 and 22 represent large gaps in the current genomic assembly. The organization and the degree of divergence of the human rDNA units within an individual nucleolar organizer region (NOR) are only partially known. To address this lacuna, we previously applied transformation-associated recombination (TAR) cloning to isolate individual rDNA units from chromosome 21. That approach revealed an unexpectedly high level of heterogeneity in human rDNA, raising the possibility of corresponding variations in ribosome dynamics. We have now applied the same strategy to analyze an entire rDNA array end-to-end from a copy of chromosome 22. Sequencing of TAR isolates provided the entire NOR sequence, including proximal and distal junctions that may be involved in nucleolar function. Comparison of the newly sequenced rDNAs to reference sequence for chromosomes 22 and 21 revealed variants that are shared in human rDNA in individuals from different ethnic groups, many of them at high frequency. Analysis infers comparable intra- and inter-individual divergence of rDNA units on the same and different chromosomes, supporting the concerted evolution of rDNA units. The results provide a route to investigate further the role of rDNA variation in nucleolar formation and in the empirical associations of nucleoli with pathology.

Asunto(s)

Cromosomas Humanos Par 22/genética , ADN Ribosómico/genética , Genoma Humano/genética , Región Organizadora del Nucléolo/genética , Nucléolo Celular/genética , Clonación Molecular , Heterogeneidad Genética , Genómica , Humanos , Anotación de Secuencia Molecular , Ribosomas/genética

4.

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.

Rhie, Arang; Walenz, Brian P; Koren, Sergey; Phillippy, Adam M.

Genome Biol ; 21(1): 245, 2020 09 14.

Artículo en Inglés | MEDLINE | ID: mdl-32928274

RESUMEN

Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.

Asunto(s)

Genómica/métodos , Programas Informáticos , Arabidopsis , Genoma Humano , Genoma de Planta , Humanos

5.

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Nurk, Sergey; Walenz, Brian P; Rhie, Arang; Vollger, Mitchell R; Logsdon, Glennis A; Grothe, Robert; Miga, Karen H; Eichler, Evan E; Phillippy, Adam M; Koren, Sergey.

Genome Res ; 30(9): 1291-1305, 2020 09.

Artículo en Inglés | MEDLINE | ID: mdl-32801147

RESUMEN

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.

Asunto(s)

Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Alelos , Animales , Línea Celular , Duplicación Cromosómica , ADN de Neoplasias , ADN Satélite , Drosophila/genética , Genoma Humano , Haplotipos , Humanos , Reproducibilidad de los Resultados , Programas Informáticos

6.

Weighted minimizer sampling improves long read mapping.

Jain, Chirag; Rhie, Arang; Zhang, Haowen; Chu, Claudia; Walenz, Brian P; Koren, Sergey; Phillippy, Adam M.

Bioinformatics ; 36(Suppl_1): i111-i118, 2020 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-32657365

RESUMEN

MOTIVATION: In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the k-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g. Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions. RESULTS: We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while considering a weight for each k-mer; i.e. the higher the weight of a k-mer, the more likely it is to be selected. By down-weighting frequently occurring k-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. AVAILABILITY AND IMPLEMENTATION: Winnowmap is built on top of the Minimap2 codebase and is available at https://github.com/marbl/winnowmap.

Asunto(s)

Compresión de Datos , Programas Informáticos , Algoritmos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN

7.

Effect of sequence depth and length in long-read assembly of the maize inbred NC358.

Ou, Shujun; Liu, Jianing; Chougule, Kapeel M; Fungtammasan, Arkarachai; Seetharam, Arun S; Stein, Joshua C; Llaca, Victor; Manchanda, Nancy; Gilbert, Amanda M; Wei, Sharon; Chin, Chen-Shan; Hufnagel, David E; Pedersen, Sarah; Snodgrass, Samantha J; Fengler, Kevin; Woodhouse, Margaret; Walenz, Brian P; Koren, Sergey; Phillippy, Adam M; Hannigan, Brett T; Dawe, R Kelly; Hirsch, Candice N; Hufford, Matthew B; Ware, Doreen.

Nat Commun ; 11(1): 2288, 2020 05 08.

Artículo en Inglés | MEDLINE | ID: mdl-32385271

RESUMEN

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Endogamia , Zea mays/genética , Secuencia de Bases , Elementos Transponibles de ADN/genética , Genoma de Planta , Secuencias Repetitivas de Ácidos Nucleicos/genética

8.

Erratum: Author Correction: Improved reference genome for the domestic horse increases assembly contiguity and composition.

Kalbfleisch, Theodore S; Rice, Edward S; DePriest, Michael S; Walenz, Brian P; Hestand, Matthew S; Vermeesch, Joris R; O'Connell, Brendan L; Fiddes, Ian T; Vershinina, Alisa O; Saremi, Nedda F; Petersen, Jessica L; Finno, Carrie J; Bellone, Rebecca R; McCue, Molly E; Brooks, Samantha A; Bailey, Ernest; Orlando, Ludovic; Green, Richard E; Miller, Donald C; Antczak, Douglas F; MacLeod, James N.

Commun Biol ; 2: 342, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31531403

RESUMEN

[This corrects the article DOI: 10.1038/s42003-018-0199-z.].

9.

Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

Ghurye, Jay; Rhie, Arang; Walenz, Brian P; Schmitt, Anthony; Selvaraj, Siddarth; Pop, Mihai; Phillippy, Adam M; Koren, Sergey.

PLoS Comput Biol ; 15(8): e1007273, 2019 08.

Artículo en Inglés | MEDLINE | ID: mdl-31433799

RESUMEN

Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

Asunto(s)

Cromosomas Humanos/genética , Genoma Humano , Genómica/métodos , Algoritmos , Animales , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Biblioteca Genómica , Genómica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos

10.

Improved reference genome for the domestic horse increases assembly contiguity and composition.

Kalbfleisch, Theodore S; Rice, Edward S; DePriest, Michael S; Walenz, Brian P; Hestand, Matthew S; Vermeesch, Joris R; O Connell, Brendan L; Fiddes, Ian T; Vershinina, Alisa O; Saremi, Nedda F; Petersen, Jessica L; Finno, Carrie J; Bellone, Rebecca R; McCue, Molly E; Brooks, Samantha A; Bailey, Ernest; Orlando, Ludovic; Green, Richard E; Miller, Donald C; Antczak, Douglas F; MacLeod, James N.

Commun Biol ; 1: 197, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30456315

RESUMEN

Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33 Gb in EquCab2 to 2.41 Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5 Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.

11.

De novo assembly of haplotype-resolved genomes with trio binning.

Koren, Sergey; Rhie, Arang; Walenz, Brian P; Dilthey, Alexander T; Bickhart, Derek M; Kingan, Sarah B; Hiendleder, Stefan; Williams, John L; Smith, Timothy P L; Phillippy, Adam M.

Nat Biotechnol ; 2018 Oct 22.

Artículo en Inglés | MEDLINE | ID: mdl-30346939

RESUMEN

Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.

12.

Hybrid assembly with long and short reads improves discovery of gene family expansions.

Miller, Jason R; Zhou, Peng; Mudge, Joann; Gurtowski, James; Lee, Hayan; Ramaraj, Thiruvarangan; Walenz, Brian P; Liu, Junqi; Stupar, Robert M; Denny, Roxanne; Song, Li; Singh, Namrata; Maron, Lyza G; McCouch, Susan R; McCombie, W Richard; Schatz, Michael C; Tiffin, Peter; Young, Nevin D; Silverstein, Kevin A T.

BMC Genomics ; 18(1): 541, 2017 07 19.

Artículo en Inglés | MEDLINE | ID: mdl-28724409

RESUMEN

BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. RESULTS: Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. CONCLUSION: Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.

Asunto(s)

Genes de Plantas/genética , Genómica/métodos , Variaciones en el Número de Copia de ADN , Medicago truncatula/genética , Familia de Multigenes/genética , Oryza/genética , Fenotipo , Secuencias Repetidas en Tándem/genética

13.

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Koren, Sergey; Walenz, Brian P; Berlin, Konstantin; Miller, Jason R; Bergman, Nicholas H; Phillippy, Adam M.

Genome Res ; 27(5): 722-736, 2017 05.

Artículo en Inglés | MEDLINE | ID: mdl-28298431

RESUMEN

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Asunto(s)

Mapeo Contig/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Mapeo Contig/normas , Drosophila melanogaster/genética , Genoma Bacteriano , Genómica/normas , Humanos , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN/normas

14.

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J.

BMC Genomics ; 18(1): 95, 2017 01 18.

Artículo en Inglés | MEDLINE | ID: mdl-28100185

RESUMEN

BACKGROUND: The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. RESULTS: By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. CONCLUSIONS: The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

Asunto(s)

Gadus morhua/genética , Genómica/métodos , Secuencias Repetidas en Tándem/genética , Animales , Heterocigoto , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN

15.

The Atlantic salmon genome provides insights into rediploidization.

Lien, Sigbjørn; Koop, Ben F; Sandve, Simen R; Miller, Jason R; Kent, Matthew P; Nome, Torfinn; Hvidsten, Torgeir R; Leong, Jong S; Minkley, David R; Zimin, Aleksey; Grammes, Fabian; Grove, Harald; Gjuvsland, Arne; Walenz, Brian; Hermansen, Russell A; von Schalburg, Kris; Rondeau, Eric B; Di Genova, Alex; Samy, Jeevan K A; Olav Vik, Jon; Vigeland, Magnus D; Caler, Lis; Grimholt, Unni; Jentoft, Sissel; Våge, Dag Inge; de Jong, Pieter; Moen, Thomas; Baranski, Matthew; Palti, Yniv; Smith, Douglas R; Yorke, James A; Nederbragt, Alexander J; Tooming-Klunderud, Ave; Jakobsen, Kjetill S; Jiang, Xuanting; Fan, Dingding; Hu, Yan; Liberles, David A; Vidal, Rodrigo; Iturra, Patricia; Jones, Steven J M; Jonassen, Inge; Maass, Alejandro; Omholt, Stig W; Davidson, William S.

Nature ; 533(7602): 200-5, 2016 05 12.

Artículo en Inglés | MEDLINE | ID: mdl-27088604

RESUMEN

The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.

Asunto(s)

Diploidia , Evolución Molecular , Duplicación de Gen/genética , Genes Duplicados/genética , Genoma/genética , Salmo salar/genética , Animales , Elementos Transponibles de ADN/genética , Femenino , Genómica , Masculino , Modelos Genéticos , Mutagénesis/genética , Filogenia , Estándares de Referencia , Salmo salar/clasificación , Homología de Secuencia

16.

Genomic insights into the Ixodes scapularis tick vector of Lyme disease.

Gulia-Nuss, Monika; Nuss, Andrew B; Meyer, Jason M; Sonenshine, Daniel E; Roe, R Michael; Waterhouse, Robert M; Sattelle, David B; de la Fuente, José; Ribeiro, Jose M; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R; Walenz, Brian P; Koren, Sergey; Hostetler, Jessica B; Thiagarajan, Mathangi; Joardar, Vinita S; Hannick, Linda I; Bidwell, Shelby; Hammond, Martin P; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L; Almeida, Francisca C; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W; Bonzon-Kulichenko, Elena; Buckingham, Steven D; Caffrey, Daniel R; Caimano, Melissa J; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J; Giraldo-Calderón, Gloria I; Grabowski, Jeffrey M; Jiang, David; Khalil, Sayed M S; Kim, Donghun; Kocan, Katherine M; Koci, Juraj; Kuhn, Richard J; Kurtti, Timothy J; Lees, Kristin; Lang, Emma G; Kennedy, Ryan C; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin.

Nat Commun ; 7: 10507, 2016 Feb 09.

Artículo en Inglés | MEDLINE | ID: mdl-26856261

RESUMEN

Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing â¼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick-host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host 'questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent.

Asunto(s)

Anaplasma phagocytophilum , Vectores Arácnidos/genética , Genoma/genética , Ixodes/genética , Canales Iónicos Activados por Ligandos/genética , Animales , Perfilación de la Expresión Génica , Genómica , Enfermedad de Lyme/transmisión , Oocitos , Xenopus laevis

17.

The genome of Anopheles darlingi, the main neotropical malaria vector.

Marinotti, Osvaldo; Cerqueira, Gustavo C; de Almeida, Luiz Gonzaga Paula; Ferro, Maria Inês Tiraboschi; Loreto, Elgion Lucio da Silva; Zaha, Arnaldo; Teixeira, Santuza M R; Wespiser, Adam R; Almeida E Silva, Alexandre; Schlindwein, Aline Daiane; Pacheco, Ana Carolina Landim; Silva, Artur Luiz da Costa da; Graveley, Brenton R; Walenz, Brian P; Lima, Bruna de Araujo; Ribeiro, Carlos Alexandre Gomes; Nunes-Silva, Carlos Gustavo; de Carvalho, Carlos Roberto; Soares, Célia Maria de Almeida; de Menezes, Claudia Beatriz Afonso; Matiolli, Cleverson; Caffrey, Daniel; Araújo, Demetrius Antonio M; de Oliveira, Diana Magalhães; Golenbock, Douglas; Grisard, Edmundo Carlos; Fantinatti-Garboggini, Fabiana; de Carvalho, Fabíola Marques; Barcellos, Fernando Gomes; Prosdocimi, Francisco; May, Gemma; Azevedo Junior, Gilson Martins de; Guimarães, Giselle Moura; Goldman, Gustavo Henrique; Padilha, Itácio Q M; Batista, Jacqueline da Silva; Ferro, Jesus Aparecido; Ribeiro, José M C; Fietto, Juliana Lopes Rangel; Dabbas, Karina Maia; Cerdeira, Louise; Agnez-Lima, Lucymara Fassarella; Brocchi, Marcelo; de Carvalho, Marcos Oliveira; Teixeira, Marcus de Melo; Diniz Maia, Maria de Mascena; Goldman, Maria Helena S; Cruz Schneider, Maria Paula; Felipe, Maria Sueli Soares; Hungria, Mariangela.

Nucleic Acids Res ; 41(15): 7387-400, 2013 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-23761445

RESUMEN

Anopheles darlingi is the principal neotropical malaria vector, responsible for more than a million cases of malaria per year on the American continent. Anopheles darlingi diverged from the African and Asian malaria vectors â¼100 million years ago (mya) and successfully adapted to the New World environment. Here we present an annotated reference A. darlingi genome, sequenced from a wild population of males and females collected in the Brazilian Amazon. A total of 10 481 predicted protein-coding genes were annotated, 72% of which have their closest counterpart in Anopheles gambiae and 21% have highest similarity with other mosquito species. In spite of a long period of divergent evolution, conserved gene synteny was observed between A. darlingi and A. gambiae. More than 10 million single nucleotide polymorphisms and short indels with potential use as genetic markers were identified. Transposable elements correspond to 2.3% of the A. darlingi genome. Genes associated with hematophagy, immunity and insecticide resistance, directly involved in vector-human and vector-parasite interactions, were identified and discussed. This study represents the first effort to sequence the genome of a neotropical malaria vector, and opens a new window through which we can contemplate the evolutionary history of anopheline mosquitoes. It also provides valuable information that may lead to novel strategies to reduce malaria transmission on the South American continent. The A. darlingi genome is accessible at www.labinfo.lncc.br/index.php/anopheles-darlingi.

Asunto(s)

Anopheles/genética , Genoma de los Insectos , Insectos Vectores/genética , Animales , Anopheles/clasificación , Brasil , Cromosomas de Insectos/genética , Elementos Transponibles de ADN , Evolución Molecular , Femenino , Variación Genética , Interacciones Huésped-Parásitos , Proteínas de Insectos/genética , Insectos Vectores/clasificación , Resistencia a los Insecticidas , Insecticidas/farmacología , Malaria/parasitología , Masculino , Anotación de Secuencia Molecular , Filogenia , Sintenía , Transcriptoma

18.

Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Koren, Sergey; Schatz, Michael C; Walenz, Brian P; Martin, Jeffrey; Howard, Jason T; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A; McCombie, W Richard; Jarvis, Erich D.

Nat Biotechnol ; 30(7): 693-700, 2012 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-22750884

RESUMEN

Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.

Asunto(s)

Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Algoritmos , Bacterias/genética , Bacteriófagos/genética , ARN/genética , Zea mays/genética

19.

The bonobo genome compared with the chimpanzee and human genomes.

Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R; Mullikin, James C; Meader, Stephen J; Ponting, Chris P; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M; Fischer, Anne; Ptak, Susan E; Lachmann, Michael; Symer, David E; Mailund, Thomas; Schierup, Mikkel H; Andrés, Aida M; Kelso, Janet; Pääbo, Svante.

Nature ; 486(7404): 527-31, 2012 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-22722832

RESUMEN

Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other.

Asunto(s)

Evolución Molecular , Variación Genética/genética , Genoma Humano/genética , Genoma/genética , Pan paniscus/genética , Pan troglodytes/genética , Animales , Elementos Transponibles de ADN/genética , Duplicación de Gen/genética , Genotipo , Humanos , Datos de Secuencia Molecular , Fenotipo , Filogenia , Especificidad de la Especie

20.

A Rickettsia genome overrun by mobile genetic elements provides insight into the acquisition of genes characteristic of an obligate intracellular lifestyle.

Gillespie, Joseph J; Joardar, Vinita; Williams, Kelly P; Driscoll, Timothy; Hostetler, Jessica B; Nordberg, Eric; Shukla, Maulik; Walenz, Brian; Hill, Catherine A; Nene, Vishvanath M; Azad, Abdu F; Sobral, Bruno W; Caler, Elisabet.

J Bacteriol ; 194(2): 376-94, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22056929

RESUMEN

We present the draft genome for the Rickettsia endosymbiont of Ixodes scapularis (REIS), a symbiont of the deer tick vector of Lyme disease in North America. Among Rickettsia species (Alphaproteobacteria: Rickettsiales), REIS has the largest genome sequenced to date (>2 Mb) and contains 2,309 genes across the chromosome and four plasmids (pREIS1 to pREIS4). The most remarkable finding within the REIS genome is the extraordinary proliferation of mobile genetic elements (MGEs), which contributes to a limited synteny with other Rickettsia genomes. In particular, an integrative conjugative element named RAGE (for Rickettsiales amplified genetic element), previously identified in scrub typhus rickettsiae (Orientia tsutsugamushi) genomes, is present on both the REIS chromosome and plasmids. Unlike the pseudogene-laden RAGEs of O. tsutsugamushi, REIS encodes nine conserved RAGEs that include F-like type IV secretion systems similar to that of the tra genes encoded in the Rickettsia bellii and R. massiliae genomes. An unparalleled abundance of encoded transposases (>650) relative to genome size, together with the RAGEs and other MGEs, comprise ~35% of the total genome, making REIS one of the most plastic and repetitive bacterial genomes sequenced to date. We present evidence that conserved rickettsial genes associated with an intracellular lifestyle were acquired via MGEs, especially the RAGE, through a continuum of genomic invasions. Robust phylogeny estimation suggests REIS is ancestral to the virulent spotted fever group of rickettsiae. As REIS is not known to invade vertebrate cells and has no known pathogenic effects on I. scapularis, its genome sequence provides insight on the origin of mechanisms of rickettsial pathogenicity.

Asunto(s)

Regulación Bacteriana de la Expresión Génica/fisiología , Genoma Bacteriano , Secuencias Repetitivas Esparcidas , Ixodes/microbiología , Rickettsia/genética , Animales , Vectores Arácnidos/microbiología , Evolución Biológica , Mapeo Cromosómico , Cromosomas Bacterianos , Datos de Secuencia Molecular , Plásmidos , Simbiosis

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA