Pesquisa | Portal de Pesquisa da BVS Enfermagem

LongStitch: high-quality genome assembly correction and scaffolding using long reads.

Coombe, Lauren; Li, Janet X; Lo, Theodora; Wong, Johnathan; Nikolic, Vladimir; Warren, René L; Birol, Inanc.

BMC Bioinformatics ; 22(1): 534, 2021 Oct 30.

Artigo em Inglês | MEDLINE | ID: mdl-34717540

RESUMO

BACKGROUND: Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS: LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS: Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .

Assuntos

Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Genoma , Humanos , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA

Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim.

Yang, Chen; Lo, Theodora; Nip, Ka Ming; Hafezqorani, Saber; Warren, René L; Birol, Inanc.

Gigascience ; 122023 03 20.

Artigo em Inglês | MEDLINE | ID: mdl-36939007

RESUMO

BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS: The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Metagenoma , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Metagenômica/métodos , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos

Assembly and annotation of the black spruce genome provide insights on spruce phylogeny and evolution of stress response.

Lo, Theodora; Coombe, Lauren; Gagalova, Kristina K; Marr, Alex; Warren, René L; Kirk, Heather; Pandoh, Pawan; Zhao, Yongjun; Moore, Richard A; Mungall, Andrew J; Ritland, Carol; Pavy, Nathalie; Jones, Steven J M; Bohlmann, Joerg; Bousquet, Jean; Birol, Inanç; Thomson, Ashley.

G3 (Bethesda) ; 14(1)2023 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-37875130

RESUMO

Black spruce (Picea mariana [Mill.] B.S.P.) is a dominant conifer species in the North American boreal forest that plays important ecological and economic roles. Here, we present the first genome assembly of P. mariana with a reconstructed genome size of 18.3 Gbp and NG50 scaffold length of 36.0 kbp. A total of 66,332 protein-coding sequences were predicted in silico and annotated based on sequence homology. We analyzed the evolutionary relationships between P. mariana and 5 other spruces for which complete nuclear and organelle genome sequences were available. The phylogenetic tree estimated from mitochondrial genome sequences agrees with biogeography; specifically, P. mariana was strongly supported as a sister lineage to P. glauca and 3 other taxa found in western North America, followed by the European Picea abies. We obtained mixed topologies with weaker statistical support in phylogenetic trees estimated from nuclear and chloroplast genome sequences, indicative of ancient reticulate evolution affecting these 2 genomes. Clustering of protein-coding sequences from the 6 Picea taxa and 2 Pinus species resulted in 34,776 orthogroups, 560 of which appeared to be specific to P. mariana. Analysis of these specific orthogroups and dN/dS analysis of positive selection signatures for 497 single-copy orthogroups identified gene functions mostly related to plant development and stress response. The P. mariana genome assembly and annotation provides a valuable resource for forest genetics research and applications in this broadly distributed species, especially in relation to climate adaptation.

Assuntos

Picea , Filogenia , Picea/genética , América do Norte

Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data.

Hafezqorani, Saber; Yang, Chen; Lo, Theodora; Nip, Ka Ming; Warren, René L; Birol, Inanc.

Gigascience ; 9(6)2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32520350

RESUMO

BACKGROUND: Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet. FINDINGS: We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads. CONCLUSIONS: As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Software , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Transcriptoma , Fluxo de Trabalho

Complete Chloroplast Genome Sequence of a Black Spruce (Picea mariana) from Eastern Canada.

Lo, Theodora; Coombe, Lauren; Lin, Diana; Warren, René L; Kirk, Heather; Pandoh, Pawan; Zhao, Yongjun; Moore, Richard A; Mungall, Andrew J; Ritland, Carol; Bousquet, Jean; Jones, Steven J M; Bohlmann, Joerg; Thomson, Ashley; Birol, Inanc.

Microbiol Resour Announc ; 9(39)2020 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-32972944

RESUMO

Here, we present the chloroplast genome sequence of black spruce (Picea mariana), a conifer widely distributed throughout North American boreal forests. This complete and annotated chloroplast sequence is 123,961 bp long and will contribute to future studies on the genetic basis of evolutionary change in spruce and adaptation in conifers.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA