Búsqueda | BVS Bolivia

1.

Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data.

Liu, Zhi; Xie, Zhi; Li, Miaoxin.

Genome Biol ; 25(1): 188, 2024 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-39010145

RESUMEN

BACKGROUND: Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS: This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS: This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Estructural del Genoma , Programas Informáticos , Análisis de Secuencia de ADN/métodos

2.

Fine-scale characterization of the soybean rhizosphere microbiome via synthetic long reads and avidity sequencing.

Hale, Brett; Watts, Caitlin; Conatser, Matthew; Brown, Edward; Wijeratne, Asela J.

Environ Microbiome ; 19(1): 46, 2024 Jul 12.

Artículo en Inglés | MEDLINE | ID: mdl-38997772

RESUMEN

BACKGROUND: The rhizosphere microbiome displays structural and functional dynamism driven by plant, microbial, and environmental factors. While such plasticity is a well-evidenced determinant of host health, individual and community-level microbial activity within the rhizosphere remain poorly understood, due in part to the insufficient taxonomic resolution achieved through traditional marker gene amplicon sequencing. This limitation necessitates more advanced approaches (e.g., long-read sequencing) to derive ecological inferences with practical application. To this end, the present study coupled synthetic long-read technology with avidity sequencing to investigate eukaryotic and prokaryotic microbiome dynamics within the soybean (Glycine max) rhizosphere under field conditions. RESULTS: Synthetic long-read sequencing permitted de novo reconstruction of the entire 18S-ITS1-ITS2 region of the eukaryotic rRNA operon as well as all nine hypervariable regions of the 16S rRNA gene. All full-length, mapped eukaryotic amplicon sequence variants displayed genus-level classification, and 44.77% achieved species-level classification. The resultant eukaryotic microbiome encompassed five kingdoms (19 genera) of protists in addition to fungi - a depth unattainable with conventional short-read methods. In the prokaryotic fraction, every full-length, mapped amplicon sequence variant was resolved at the species level, and 23.13% at the strain level. Thirteen species of Bradyrhizobium were thereby distinguished in the prokaryotic microbiome, with strain-level identification of the two Bradyrhizobium species most reported to nodulate soybean. Moreover, the applied methodology delineated structural and compositional dynamism in response to experimental parameters (i.e., growth stage, cultivar, and biostimulant application), unveiled a saprotroph-rich core microbiome, provided empirical evidence for host selection of mutualistic taxa, and identified key microbial co-occurrence network members likely associated with edaphic and agronomic properties. CONCLUSIONS: This study is the first to combine synthetic long-read technology and avidity sequencing to profile both eukaryotic and prokaryotic fractions of a plant-associated microbiome. Findings herein provide an unparalleled taxonomic resolution of the soybean rhizosphere microbiota and represent significant biological and technological advancements in crop microbiome research.

3.

NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with Nanopore direct RNA sequencing.

Yang, Ludong; Zhang, Xinxin; Wang, Fan; Zhang, Li; Li, Jing; Yue, Jia-Xing.

J Genet Genomics ; 2024 Jul 12.

Artículo en Inglés | MEDLINE | ID: mdl-39004399

RESUMEN

Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrated its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.

4.

LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.

Ziaei Jam, Helyaneh; Zook, Justin M; Javadzadeh, Sara; Park, Jonghun; Sehgal, Aarushi; Gymrek, Melissa.

Genome Biol ; 25(1): 176, 2024 Jul 04.

Artículo en Inglés | MEDLINE | ID: mdl-38965568

RESUMEN

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .

Asunto(s)

Variación Genética , Genoma Humano , Secuencias Repetidas en Tándem , Humanos , Programas Informáticos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nanoporos/métodos

5.

Metatranscriptomics-guided genome-scale metabolic reconstruction reveals the carbon flux and trophic interaction in methanogenic communities.

Yan, Weifu; Wang, Dou; Wang, Yubo; Wang, Chunxiao; Chen, Xi; Liu, Lei; Wang, Yulin; Li, Yu-You; Kamagata, Yoichi; Nobu, Masaru K; Zhang, Tong.

Microbiome ; 12(1): 121, 2024 Jul 05.

Artículo en Inglés | MEDLINE | ID: mdl-38970122

RESUMEN

BACKGROUND: Despite rapid advances in genomic-resolved metagenomics and remarkable explosion of metagenome-assembled genomes (MAGs), the function of uncultivated anaerobic lineages and their interactions in carbon mineralization remain largely uncertain, which has profound implications in biotechnology and biogeochemistry. RESULTS: In this study, we combined long-read sequencing and metatranscriptomics-guided metabolic reconstruction to provide a genome-wide perspective of carbon mineralization flow from polymers to methane in an anaerobic bioreactor. Our results showed that incorporating long reads resulted in a substantial improvement in the quality of metagenomic assemblies, enabling the effective recovery of 132 high-quality genomes meeting stringent criteria of minimum information about a metagenome-assembled genome (MIMAG). In addition, hybrid assembly obtained 51% more prokaryotic genes in comparison to the short-read-only assembly. Metatranscriptomics-guided metabolic reconstruction unveiled the remarkable metabolic flexibility of several novel Bacteroidales-affiliated bacteria and populations from Mesotoga sp. in scavenging amino acids and sugars. In addition to recovering two circular genomes of previously known but fragmented syntrophic bacteria, two newly identified bacteria within Syntrophales were found to be highly engaged in fatty acid oxidation through syntrophic relationships with dominant methanogens Methanoregulaceae bin.74 and Methanothrix sp. bin.206. The activity of bin.206 preferring acetate as substrate exceeded that of bin.74 with increasing loading, reinforcing the substrate determinantal role. CONCLUSION: Overall, our study uncovered some key active anaerobic lineages and their metabolic functions in this complex anaerobic ecosystem, offering a framework for understanding carbon transformations in anaerobic digestion. These findings advance the understanding of metabolic activities and trophic interactions between anaerobic guilds, providing foundational insights into carbon flux within both engineered and natural ecosystems. Video Abstract.

Asunto(s)

Carbono , Metagenómica , Metano , Metano/metabolismo , Carbono/metabolismo , Metagenómica/métodos , Reactores Biológicos/microbiología , Metagenoma , Bacterias/genética , Bacterias/metabolismo , Bacterias/clasificación , Filogenia , Anaerobiosis , Transcriptoma , Genoma Bacteriano , Microbiota , Perfilación de la Expresión Génica

6.

Long-Read Structural and Epigenetic Profiling of a Kidney Tumor-Matched Sample with Nanopore Sequencing and Optical Genome Mapping.

Margalit, Sapir; Tulpová, Zuzana; Detinis Zur, Tahir; Michaeli, Yael; Deek, Jasline; Nifker, Gil; Haldar, Rita; Gnatek, Yehudit; Omer, Dorit; Dekel, Benjamin; Feldman, Hagit Baris; Grunwald, Assaf; Ebenstein, Yuval.

bioRxiv ; 2024 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-38915648

RESUMEN

Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor's structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.

7.

Characterization of telomere variant repeats using long reads enables allele-specific telomere length estimation.

Stephens, Zachary; Kocher, Jean-Pierre.

BMC Bioinformatics ; 25(1): 194, 2024 May 17.

Artículo en Inglés | MEDLINE | ID: mdl-38755561

RESUMEN

Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.

Asunto(s)

Alelos , Telómero , Telómero/genética , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuencias Repetitivas de Ácidos Nucleicos/genética

8.

Genome Assembly of the Dyeing Poison Frog Provides Insights into the Dynamics of Transposable Element and Genome-Size Evolution.

Dittrich, Carolin; Hoelzl, Franz; Smith, Steve; Fouilloux, Chloe A; Parker, Darren J; O'Connell, Lauren A; Knowles, Lucy S; Hughes, Margaret; Fewings, Ade; Morgan, Rhys; Rojas, Bibiana; Comeault, Aaron A.

Genome Biol Evol ; 16(6)2024 06 04.

Artículo en Inglés | MEDLINE | ID: mdl-38753031

RESUMEN

Genome size varies greatly across the tree of life and transposable elements are an important contributor to this variation. Among vertebrates, amphibians display the greatest variation in genome size, making them ideal models to explore the causes and consequences of genome size variation. However, high-quality genome assemblies for amphibians have, until recently, been rare. Here, we generate a high-quality genome assembly for the dyeing poison frog, Dendrobates tinctorius. We compare this assembly to publicly available frog genomes and find evidence for both large-scale conserved synteny and widespread rearrangements between frog lineages. Comparing conserved orthologs annotated in these genomes revealed a strong correlation between genome size and gene size. To explore the cause of gene-size variation, we quantified the location of transposable elements relative to gene features and find that the accumulation of transposable elements in introns has played an important role in the evolution of gene size in D. tinctorius, while estimates of insertion times suggest that many insertion events are recent and species-specific. Finally, we carry out population-scale mobile-element sequencing and show that the diversity and abundance of transposable elements in poison frog genomes can complicate genotyping from repetitive element sequence anchors. Our results show that transposable elements have clearly played an important role in the evolution of large genome size in D. tinctorius. Future studies are needed to fully understand the dynamics of transposable element evolution and to optimize primer or bait design for cost-effective population-level genotyping in species with large, repetitive genomes.

Asunto(s)

Anuros , Elementos Transponibles de ADN , Evolución Molecular , Tamaño del Genoma , Genoma , Animales , Anuros/genética , Ranas Venenosas

9.

Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies.

Bouras, George; Houtak, Ghais; Wick, Ryan R; Mallawaarachchi, Vijini; Roach, Michael J; Papudeshi, Bhavya; Judd, Lousie M; Sheppard, Anna E; Edwards, Robert A; Vreugde, Sarah.

Microb Genom ; 10(5)2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38717808

RESUMEN

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

Asunto(s)

Algoritmos , Genoma Bacteriano , Programas Informáticos , Plásmidos/genética , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Bacterias/genética , Bacterias/clasificación

10.

Copy number variation and elevated genetic diversity at immune trait loci in Atlantic and Pacific herring.

Mohamadnejad Sangdehi, Fahime; Jamsandekar, Minal S; Enbody, Erik D; Pettersson, Mats E; Andersson, Leif.

BMC Genomics ; 25(1): 459, 2024 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-38730342

RESUMEN

BACKGROUND: Genome-wide comparisons of populations are widely used to explore the patterns of nucleotide diversity and sequence divergence to provide knowledge on how natural selection and genetic drift affect the genome. In this study we have compared whole-genome sequencing data from Atlantic and Pacific herring, two sister species that diverged about 2 million years ago, to explore the pattern of genetic differentiation between the two species. RESULTS: The genome comparison of the two species revealed high genome-wide differentiation but with islands of remarkably low genetic differentiation, as measured by an FST analysis. However, the low FST observed in these islands is not caused by low interspecies sequence divergence (dxy) but rather by exceptionally high estimated intraspecies nucleotide diversity (π). These regions of low differentiation and elevated nucleotide diversity, termed high-diversity regions in this study, are not enriched for repeats but are highly enriched for immune-related genes. This enrichment includes genes from both the adaptive immune system, such as immunoglobulin, T-cell receptor and major histocompatibility complex genes, as well as a substantial number of genes with a role in the innate immune system, e.g. novel immune-type receptor, tripartite motif and tumor necrosis factor receptor genes. Analysis of long-read based assemblies from two Atlantic herring individuals revealed extensive copy number variation in these genomic regions, indicating that the elevated intraspecies nucleotide diversities were partially due to the cross-mapping of short reads. CONCLUSIONS: This study demonstrates that copy number variation is a characteristic feature of immune trait loci in herring. Another important implication is that these loci are blind spots in classical genome-wide screens for genetic differentiation using short-read data, not only in herring, likely also in other species harboring qualitatively similar variation at immune trait loci. These loci stood out in this study because of the relatively high genome-wide baseline for FST values between Atlantic and Pacific herring.

Asunto(s)

Variaciones en el Número de Copia de ADN , Peces , Animales , Peces/genética , Peces/inmunología , Variación Genética , Océano Atlántico , Sitios de Carácter Cuantitativo , Secuenciación Completa del Genoma

11.

CAREx: context-aware read extension of paired-end sequencing data.

Kallenborn, Felix; Schmidt, Bertil.

BMC Bioinformatics ; 25(1): 186, 2024 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-38730374

RESUMEN

BACKGROUND: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. RESULTS: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. CONCLUSION: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Alineación de Secuencia/métodos

12.

The invasive land flatworm Arthurdendyus triangulatus has repeated sequences in the mitogenome, extra-long cox2 gene and paralogous nuclear rRNA clusters.

Gastineau, Romain; Lemieux, Claude; Turmel, Monique; Otis, Christian; Boyle, Brian; Coulis, Mathieu; Gouraud, Clément; Boag, Brian; Murchie, Archie K; Winsor, Leigh; Justine, Jean-Lou.

Sci Rep ; 14(1): 7840, 2024 04 03.

Artículo en Inglés | MEDLINE | ID: mdl-38570596

RESUMEN

Using a combination of short- and long-reads sequencing, we were able to sequence the complete mitochondrial genome of the invasive 'New Zealand flatworm' Arthurdendyus triangulatus (Geoplanidae, Rhynchodeminae, Caenoplanini) and its two complete paralogous nuclear rRNA gene clusters. The mitogenome has a total length of 20,309 bp and contains repetitions that includes two types of tandem-repeats that could not be solved by short-reads sequencing. We also sequenced for the first time the mitogenomes of four species of Caenoplana (Caenoplanini). A maximum likelihood phylogeny associated A. triangulatus with the other Caenoplanini but Parakontikia ventrolineata and Australopacifica atrata were rejected from the Caenoplanini and associated instead with the Rhynchodemini, with Platydemus manokwari. It was found that the mitogenomes of all species of the subfamily Rhynchodeminae share several unusual structural features, including a very long cox2 gene. This is the first time that the complete paralogous rRNA clusters, which differ in length, sequence and seemingly number of copies, were obtained for a Geoplanidae.

Asunto(s)

Genoma Mitocondrial , Platelmintos , Animales , Platelmintos/genética , Genoma Mitocondrial/genética , Secuencias Repetitivas de Ácidos Nucleicos , Filogenia , Análisis de Secuencia de ADN , ARN Ribosómico/genética

13.

The Application of Long-Read Sequencing to Cancer.

Ermini, Luca; Driguez, Patrick.

Cancers (Basel) ; 16(7)2024 Mar 25.

Artículo en Inglés | MEDLINE | ID: mdl-38610953

RESUMEN

Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.

14.

Advancements in long-read genome sequencing technologies and algorithms.

Espinosa, Elena; Bautista, Rocio; Larrosa, Rafael; Plata, Oscar.

Genomics ; 116(3): 110842, 2024 05.

Artículo en Inglés | MEDLINE | ID: mdl-38608738

RESUMEN

The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Secuenciación Completa del Genoma/métodos

15.

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads.

Hu, Jiang; Wang, Zhuo; Sun, Zongyi; Hu, Benxia; Ayoola, Adeola Oluwakemi; Liang, Fan; Li, Jingjing; Sandoval, José R; Cooper, David N; Ye, Kai; Ruan, Jue; Xiao, Chuan-Le; Wang, Depeng; Wu, Dong-Dong; Wang, Sheng.

Genome Biol ; 25(1): 107, 2024 04 26.

Artículo en Inglés | MEDLINE | ID: mdl-38671502

RESUMEN

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.

Asunto(s)

Variaciones en el Número de Copia de ADN , Genoma Humano , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Secuenciación de Nanoporos/métodos , Análisis de Secuencia de ADN/métodos , Genómica/métodos

16.

MCSS: microbial community simulator based on structure.

Hui, Xingqi; Yang, Jinbao; Sun, Jinhuan; Liu, Fang; Pan, Weihua.

Front Microbiol ; 15: 1358257, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38516019

RESUMEN

De novo assembly plays a pivotal role in metagenomic analysis, and the incorporation of third-generation sequencing technology can significantly improve the integrity and accuracy of assembly results. Recently, with advancements in sequencing technology (Hi-Fi, ultra-long), several long-read-based bioinformatic tools have been developed. However, the validation of the performance and reliability of these tools is a crucial concern. To address this gap, we present MCSS (microbial community simulator based on structure), which has the capability to generate simulated microbial community and sequencing datasets based on the structure attributes of real microbiome communities. The evaluation results indicate that it can generate simulated communities that exhibit both diversity and similarity to actual community structures. Additionally, MCSS generates synthetic PacBio Hi-Fi and Oxford Nanopore Technologies (ONT) long reads for the species within the simulated community. This innovative tool provides a valuable resource for benchmarking and refining metagenomic analysis methods. Code available at: https://github.com/panlab-bio/mcss.

17.

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Jousheghani, Zahra Zare; Patro, Rob.

bioRxiv ; 2024 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-38464200

RESUMEN

Motivation: Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data. Results: We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type. Availability and Implementation: Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.

18.

ONT read assembly of the black rhino genome.

Kraaijeveld, Ken; Bossers, Koen; Petrusevski, Nikola; Pieterman, Stef; Bruins-van Sonsbeek, Linda G R; Wittink, Floyd.

BMC Genom Data ; 25(1): 27, 2024 Mar 05.

Artículo en Inglés | MEDLINE | ID: mdl-38443836

RESUMEN

OBJECTIVES: The black rhinoceros (Diceros bicornis) is an endangered mammal for which a captive breeding program is part of the conservation effort. Black rhinos in zoo's often suffer from chronic infections and heamochromatosis. Furthermore, breeding is hampered by low male fertility. To aid a research project studying these topics, we sequenced and assembled the genome of a captive male black rhino using ONT sequencing data only. DATA DESCRIPTION: This work produced over 100 Gb whole genome sequencing reads from whole blood. These were assembled into a 2.47 Gb draft genome consisting of 834 contigs with an N50 of 29.53 Mb. The genome annotation was lifted over from an available genome annotation for black rhino, which resulted in the retrieval of over 99% of gene features. This new genome assembly will be a valuable resource in for conservation genetic research in this species.

Asunto(s)

Investigación Genética , Nariz , Masculino , Animales , Perisodáctilos/genética , Infección Persistente , Proyectos de Investigación

19.

Improved genome assembly of the whiteleg shrimp Penaeus (Litopenaeus) vannamei using long- and short-read sequences from public databases.

Perez-Enriquez, Ricardo; Juárez, Oscar E; Galindo-Torres, Pavel; Vargas-Aguilar, Ana Luisa; Llera-Herrera, Raúl.

J Hered ; 115(3): 302-310, 2024 May 09.

Artículo en Inglés | MEDLINE | ID: mdl-38451162

RESUMEN

The Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei is a highly relevant species for the world's aquaculture development, for which an incomplete genome is available in public databases. In this work, PacBio long-reads from 14 publicly available genomic libraries (131.2 Gb) were mined to improve the reference genome assembly. The libraries were assembled, polished using Illumina short-reads, and scaffolded with P. vannamei, Feneropenaeus chinensis, and Penaeus monodon genomes. The reference-guided assembly, organized into 44 pseudo-chromosomes and 15,682 scaffolds, showed an improvement from previous reference genomes with a genome size of 2.055 Gb, N50 of 40.14 Mb, L50 of 21, and the longest scaffold of 65.79 Mb. Most orthologous genes (92.6%) of the Arthropoda_odb10 database were detected as "complete," and BRAKER predicted 21,816 gene models; from these, we detected 1,814 single-copy orthologues conserved across the genomic references for Marsupenaeus japonicus, F. chinensis, and P. monodon. Transcriptomic-assembly data aligned in more than 99% to the new reference-guided assembly. The collinearity analysis of the assembled pseudo-chromosomes against the P. vannamei and P. monodon reference genomes showed high conservation in different sets of pseudo-chromosomes. In addition, more than 21,000 publicly available genetic marker sequences were mapped to single-site positions. This new assembly represents a step forward to previously reported P. vannamei assemblies. It will be helpful as a reference genome for future studies on the evolutionary history of the species, the genetic architecture of physiological and sex-determination traits, and the analysis of the changes in genetic diversity and composition of cultivated stocks.

Asunto(s)

Genoma , Penaeidae , Penaeidae/genética , Animales , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular

20.

The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies.

Cook, Ryan; Brown, Nathan; Rihtman, Branko; Michniewski, Slawomir; Redgwell, Tamsin; Clokie, Martha; Stekel, Dov J; Chen, Yin; Scanlan, David J; Hobman, Jon L; Nelson, Andrew; Jones, Michael A; Smith, Darren; Millard, Andrew.

Microb Genom ; 10(2)2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38376377

RESUMEN

Viral metagenomics has fuelled a rapid change in our understanding of global viral diversity and ecology. Long-read sequencing and hybrid assembly approaches that combine long- and short-read technologies are now being widely implemented in bacterial genomics and metagenomics. However, the use of long-read sequencing to investigate viral communities is still in its infancy. While Nanopore and PacBio technologies have been applied to viral metagenomics, it is not known to what extent different technologies will impact the reconstruction of the viral community. Thus, we constructed a mock bacteriophage community of previously sequenced phage genomes and sequenced them using Illumina, Nanopore and PacBio sequencing technologies and tested a number of different assembly approaches. When using a single sequencing technology, Illumina assemblies were the best at recovering phage genomes. Nanopore- and PacBio-only assemblies performed poorly in comparison to Illumina in both genome recovery and error rates, which both varied with the assembler used. The best Nanopore assembly had errors that manifested as SNPs and INDELs at frequencies 41 and 157â% higher than found in Illumina only assemblies, respectively. While the best PacBio assemblies had SNPs at frequencies 12 and 78â% higher than found in Illumina-only assemblies, respectively. Despite high-read coverage, long-read-only assemblies recovered a maximum of one complete genome from any assembly, unless reads were down-sampled prior to assembly. Overall the best approach was assembly by a combination of Illumina and Nanopore reads, which reduced error rates to levels comparable with short-read-only assemblies. When using a single technology, Illumina only was the best approach. The differences in genome recovery and error rates between technology and assembler had downstream impacts on gene prediction, viral prediction, and subsequent estimates of diversity within a sample. These findings will provide a starting point for others in the choice of reads and assembly algorithms for the analysis of viromes.

Asunto(s)

Bacteriófagos , Nanoporos , Benchmarking , Tecnología , Algoritmos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA