Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline.

Liu, Chao; Wu, Pei; Wu, Xue; Zhao, Xia; Chen, Fang; Cheng, Xiaofang; Zhu, Hongmei; Wang, Ou; Xu, Mengyang.

Front Genet ; 15: 1421565, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39130747

RESUMO

Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.

2.

Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data.

Liu, Zhi; Xie, Zhi; Li, Miaoxin.

Genome Biol ; 25(1): 188, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39010145

RESUMO

BACKGROUND: Structural variation (SV) detection methods using third-generation sequencing data are widely employed, yet accurately detecting SVs remains challenging. Different methods often yield inconsistent results for certain SV types, complicating tool selection and revealing biases in detection. RESULTS: This study comprehensively evaluates 53 SV detection pipelines using simulated and real data from PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms. We assess their performance in detecting various sizes and types of SVs, breakpoint biases, and genotyping accuracy with various sequencing depths. Notably, pipelines such as Minimap2-cuteSV2, NGMLR-SVIM, PBMM2-pbsv, Winnowmap-Sniffles2, and Winnowmap-SVision exhibit comparatively higher recall and precision. Our findings also show that combining multiple pipelines with the same aligner, like pbmm2 or winnowmap, can significantly enhance performance. The individual pipelines' detailed ranking and performance metrics can be viewed in a dynamic table: http://pmglab.top/SVPipelinesRanking . CONCLUSIONS: This study comprehensively characterizes the strengths and weaknesses of numerous pipelines, providing valuable insights that can improve SV detection in third-generation sequencing data and inform SV annotation and function prediction.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Variação Estrutural do Genoma , Software , Análise de Sequência de DNA/métodos

3.

HIV-1 genotypic resistance testing using single molecule real-time sequencing.

Raymond, Stéphanie; Jeanne, Nicolas; Vellas, Camille; Nicot, Florence; Saune, Karine; Ranger, Noémie; Latour, Justine; Carcenac, Romain; Harter, Agnès; Delobel, Pierre; Izopet, Jacques.

J Clin Virol ; 174: 105717, 2024 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-39068746

RESUMO

BACKGROUND: HIV-1 resistance testing is recommended in clinical management and next-generation sequencing (NGS) methods are now available in many virology laboratories. OBJECTIVES: To evaluate the diagnostic performance of Long-Read Single Molecule Real-time (SMRT) sequencing (Sequel, PacBio) for HIV-1 polymerase genotyping. STUDY DESIGN: 111 prospective clinical samples (83 plasma and 28 leukocyte-enriched blood fraction) were analyzed for routine HIV-1 resistance genotyping using Sanger sequencing, Vela NGS, and SMRT sequencing. We developed a SMRT sequencing protocol and a bio-informatics pipeline to infer antiretroviral resistance on both haplotype and variant calling approaches. RESULTS: The polymerase was successfully sequenced by the three platforms in 98 % of plasma RNA samples for viral loads above 4 log copies/mL. The success rate decreased to 83 % using Sanger or Vela sequencing and to 67 % using SMRT sequencing for viral loads of 3 to 4 log copies/mL. Sensitivities of 50 %, 54 % and 61 % were obtained using SMRT, Vela, and Sanger sequencing, respectively, in cellular DNA from patients with prolonged undetectable plasma HIV-1 RNA. Ninety-eight percent of resistance-associated mutations (RAMs) identified with Sanger sequencing were detected using SMRT sequencing. Furthermore, 91 % of RAMs (> 5 % threshold) identified with Vela NGS were detected using SMRT sequencing. RAM quantification using Vela and SMRT sequencing was well correlated (Spearman correlation ρ = 0.82; P < 0.0001). CONCLUSIONS: SMRT sequencing of the full-length HIV-1 polymerase appeared performant for characterizing HIV-1 genotypic resistance on both RNA and DNA clinical samples. Long-read sequencing is a new tool for mutation haplotyping and resistance analysis.

4.

Characterization of Six Ampeloviruses Infecting Pineapple in Reunion Island Using a Combination of High-Throughput Sequencing Approaches.

Massé, Delphine; Candresse, Thierry; Filloux, Denis; Massart, Sébastien; Cassam, Nathalie; Hostachy, Bruno; Marais, Armelle; Fernandez, Emmanuel; Roumagnac, Philippe; Verdin, Eric; Teycheney, Pierre-Yves; Lett, Jean-Michel; Lefeuvre, Pierre.

Viruses ; 16(7)2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-39066307

RESUMO

The cultivation of pineapple (Ananas comosus) is threatened worldwide by mealybug wilt disease of pineapple (MWP), whose etiology is not yet fully elucidated. In this study, we characterized pineapple mealybug wilt-associated ampeloviruses (PMWaVs, family Closteroviridae) from a diseased pineapple plant collected from Reunion Island, using a high-throughput sequencing approach combining Illumina short reads and Nanopore long reads. Reads co-assembly resulted in complete or near-complete genomes for six distinct ampeloviruses, including the first complete genome of pineapple mealybug wilt-associated virus 5 (PMWaV5) and that of a new species tentatively named pineapple mealybug wilt-associated virus 7 (PMWaV7). Short reads data provided high genome coverage and sequencing depths for all six viral genomes, contrary to long reads data. The 5' and 3' ends of the genome for most of the six ampeloviruses could be recovered from long reads, providing an alternative to RACE-PCRs. Phylogenetic analyses did not unveil any geographic structuring of the diversity of PMWaV1, PMWaV2 and PMWaV3 isolates, supporting the current hypothesis that PMWaVs were mainly spread by human activity and vegetative propagation.

Assuntos

Ananas , Closteroviridae , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Doenças das Plantas , Ananas/virologia , Doenças das Plantas/virologia , Closteroviridae/genética , Closteroviridae/classificação , Closteroviridae/isolamento & purificação , Reunião , RNA Viral/genética

5.

Metatranscriptomics-guided genome-scale metabolic reconstruction reveals the carbon flux and trophic interaction in methanogenic communities.

Yan, Weifu; Wang, Dou; Wang, Yubo; Wang, Chunxiao; Chen, Xi; Liu, Lei; Wang, Yulin; Li, Yu-You; Kamagata, Yoichi; Nobu, Masaru K; Zhang, Tong.

Microbiome ; 12(1): 121, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38970122

RESUMO

BACKGROUND: Despite rapid advances in genomic-resolved metagenomics and remarkable explosion of metagenome-assembled genomes (MAGs), the function of uncultivated anaerobic lineages and their interactions in carbon mineralization remain largely uncertain, which has profound implications in biotechnology and biogeochemistry. RESULTS: In this study, we combined long-read sequencing and metatranscriptomics-guided metabolic reconstruction to provide a genome-wide perspective of carbon mineralization flow from polymers to methane in an anaerobic bioreactor. Our results showed that incorporating long reads resulted in a substantial improvement in the quality of metagenomic assemblies, enabling the effective recovery of 132 high-quality genomes meeting stringent criteria of minimum information about a metagenome-assembled genome (MIMAG). In addition, hybrid assembly obtained 51% more prokaryotic genes in comparison to the short-read-only assembly. Metatranscriptomics-guided metabolic reconstruction unveiled the remarkable metabolic flexibility of several novel Bacteroidales-affiliated bacteria and populations from Mesotoga sp. in scavenging amino acids and sugars. In addition to recovering two circular genomes of previously known but fragmented syntrophic bacteria, two newly identified bacteria within Syntrophales were found to be highly engaged in fatty acid oxidation through syntrophic relationships with dominant methanogens Methanoregulaceae bin.74 and Methanothrix sp. bin.206. The activity of bin.206 preferring acetate as substrate exceeded that of bin.74 with increasing loading, reinforcing the substrate determinantal role. CONCLUSION: Overall, our study uncovered some key active anaerobic lineages and their metabolic functions in this complex anaerobic ecosystem, offering a framework for understanding carbon transformations in anaerobic digestion. These findings advance the understanding of metabolic activities and trophic interactions between anaerobic guilds, providing foundational insights into carbon flux within both engineered and natural ecosystems. Video Abstract.

Assuntos

Carbono , Metagenômica , Metano , Metano/metabolismo , Carbono/metabolismo , Metagenômica/métodos , Reatores Biológicos/microbiologia , Metagenoma , Bactérias/genética , Bactérias/metabolismo , Bactérias/classificação , Filogenia , Anaerobiose , Transcriptoma , Genoma Bacteriano , Microbiota , Perfilação da Expressão Gênica

6.

NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with Nanopore direct RNA sequencing.

Yang, Ludong; Zhang, Xinxin; Wang, Fan; Zhang, Li; Li, Jing; Yue, Jia-Xing.

J Genet Genomics ; 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-39004399

RESUMO

Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrated its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.

7.

Fine-scale characterization of the soybean rhizosphere microbiome via synthetic long reads and avidity sequencing.

Hale, Brett; Watts, Caitlin; Conatser, Matthew; Brown, Edward; Wijeratne, Asela J.

Environ Microbiome ; 19(1): 46, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38997772

RESUMO

BACKGROUND: The rhizosphere microbiome displays structural and functional dynamism driven by plant, microbial, and environmental factors. While such plasticity is a well-evidenced determinant of host health, individual and community-level microbial activity within the rhizosphere remain poorly understood, due in part to the insufficient taxonomic resolution achieved through traditional marker gene amplicon sequencing. This limitation necessitates more advanced approaches (e.g., long-read sequencing) to derive ecological inferences with practical application. To this end, the present study coupled synthetic long-read technology with avidity sequencing to investigate eukaryotic and prokaryotic microbiome dynamics within the soybean (Glycine max) rhizosphere under field conditions. RESULTS: Synthetic long-read sequencing permitted de novo reconstruction of the entire 18S-ITS1-ITS2 region of the eukaryotic rRNA operon as well as all nine hypervariable regions of the 16S rRNA gene. All full-length, mapped eukaryotic amplicon sequence variants displayed genus-level classification, and 44.77% achieved species-level classification. The resultant eukaryotic microbiome encompassed five kingdoms (19 genera) of protists in addition to fungi - a depth unattainable with conventional short-read methods. In the prokaryotic fraction, every full-length, mapped amplicon sequence variant was resolved at the species level, and 23.13% at the strain level. Thirteen species of Bradyrhizobium were thereby distinguished in the prokaryotic microbiome, with strain-level identification of the two Bradyrhizobium species most reported to nodulate soybean. Moreover, the applied methodology delineated structural and compositional dynamism in response to experimental parameters (i.e., growth stage, cultivar, and biostimulant application), unveiled a saprotroph-rich core microbiome, provided empirical evidence for host selection of mutualistic taxa, and identified key microbial co-occurrence network members likely associated with edaphic and agronomic properties. CONCLUSIONS: This study is the first to combine synthetic long-read technology and avidity sequencing to profile both eukaryotic and prokaryotic fractions of a plant-associated microbiome. Findings herein provide an unparalleled taxonomic resolution of the soybean rhizosphere microbiota and represent significant biological and technological advancements in crop microbiome research.

8.

LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.

Ziaei Jam, Helyaneh; Zook, Justin M; Javadzadeh, Sara; Park, Jonghun; Sehgal, Aarushi; Gymrek, Melissa.

Genome Biol ; 25(1): 176, 2024 Jul 04.

Artigo em Inglês | MEDLINE | ID: mdl-38965568

RESUMO

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .

Assuntos

Variação Genética , Genoma Humano , Sequências de Repetição em Tandem , Humanos , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos/métodos

9.

Long-Read Structural and Epigenetic Profiling of a Kidney Tumor-Matched Sample with Nanopore Sequencing and Optical Genome Mapping.

Margalit, Sapir; Tulpová, Zuzana; Detinis Zur, Tahir; Michaeli, Yael; Deek, Jasline; Nifker, Gil; Haldar, Rita; Gnatek, Yehudit; Omer, Dorit; Dekel, Benjamin; Feldman, Hagit Baris; Grunwald, Assaf; Ebenstein, Yuval.

bioRxiv ; 2024 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-38915648

RESUMO

Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor's structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.

10.

Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies.

Bouras, George; Houtak, Ghais; Wick, Ryan R; Mallawaarachchi, Vijini; Roach, Michael J; Papudeshi, Bhavya; Judd, Lousie M; Sheppard, Anna E; Edwards, Robert A; Vreugde, Sarah.

Microb Genom ; 10(5)2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38717808

RESUMO

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

Assuntos

Algoritmos , Genoma Bacteriano , Software , Plasmídeos/genética , Análise de Sequência de DNA/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Bactérias/genética , Bactérias/classificação

11.

Copy number variation and elevated genetic diversity at immune trait loci in Atlantic and Pacific herring.

Mohamadnejad Sangdehi, Fahime; Jamsandekar, Minal S; Enbody, Erik D; Pettersson, Mats E; Andersson, Leif.

BMC Genomics ; 25(1): 459, 2024 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-38730342

RESUMO

BACKGROUND: Genome-wide comparisons of populations are widely used to explore the patterns of nucleotide diversity and sequence divergence to provide knowledge on how natural selection and genetic drift affect the genome. In this study we have compared whole-genome sequencing data from Atlantic and Pacific herring, two sister species that diverged about 2 million years ago, to explore the pattern of genetic differentiation between the two species. RESULTS: The genome comparison of the two species revealed high genome-wide differentiation but with islands of remarkably low genetic differentiation, as measured by an FST analysis. However, the low FST observed in these islands is not caused by low interspecies sequence divergence (dxy) but rather by exceptionally high estimated intraspecies nucleotide diversity (π). These regions of low differentiation and elevated nucleotide diversity, termed high-diversity regions in this study, are not enriched for repeats but are highly enriched for immune-related genes. This enrichment includes genes from both the adaptive immune system, such as immunoglobulin, T-cell receptor and major histocompatibility complex genes, as well as a substantial number of genes with a role in the innate immune system, e.g. novel immune-type receptor, tripartite motif and tumor necrosis factor receptor genes. Analysis of long-read based assemblies from two Atlantic herring individuals revealed extensive copy number variation in these genomic regions, indicating that the elevated intraspecies nucleotide diversities were partially due to the cross-mapping of short reads. CONCLUSIONS: This study demonstrates that copy number variation is a characteristic feature of immune trait loci in herring. Another important implication is that these loci are blind spots in classical genome-wide screens for genetic differentiation using short-read data, not only in herring, likely also in other species harboring qualitatively similar variation at immune trait loci. These loci stood out in this study because of the relatively high genome-wide baseline for FST values between Atlantic and Pacific herring.

Assuntos

Variações do Número de Cópias de DNA , Peixes , Animais , Peixes/genética , Peixes/imunologia , Variação Genética , Oceano Atlântico , Locos de Características Quantitativas , Sequenciamento Completo do Genoma

12.

CAREx: context-aware read extension of paired-end sequencing data.

Kallenborn, Felix; Schmidt, Bertil.

BMC Bioinformatics ; 25(1): 186, 2024 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-38730374

RESUMO

BACKGROUND: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. RESULTS: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. CONCLUSION: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos , Alinhamento de Sequência/métodos

13.

Characterization of telomere variant repeats using long reads enables allele-specific telomere length estimation.

Stephens, Zachary; Kocher, Jean-Pierre.

BMC Bioinformatics ; 25(1): 194, 2024 May 17.

Artigo em Inglês | MEDLINE | ID: mdl-38755561

RESUMO

Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.

Assuntos

Alelos , Telômero , Telômero/genética , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetitivas de Ácido Nucleico/genética

14.

Genome Assembly of the Dyeing Poison Frog Provides Insights into the Dynamics of Transposable Element and Genome-Size Evolution.

Dittrich, Carolin; Hoelzl, Franz; Smith, Steve; Fouilloux, Chloe A; Parker, Darren J; O'Connell, Lauren A; Knowles, Lucy S; Hughes, Margaret; Fewings, Ade; Morgan, Rhys; Rojas, Bibiana; Comeault, Aaron A.

Genome Biol Evol ; 16(6)2024 06 04.

Artigo em Inglês | MEDLINE | ID: mdl-38753031

RESUMO

Genome size varies greatly across the tree of life and transposable elements are an important contributor to this variation. Among vertebrates, amphibians display the greatest variation in genome size, making them ideal models to explore the causes and consequences of genome size variation. However, high-quality genome assemblies for amphibians have, until recently, been rare. Here, we generate a high-quality genome assembly for the dyeing poison frog, Dendrobates tinctorius. We compare this assembly to publicly available frog genomes and find evidence for both large-scale conserved synteny and widespread rearrangements between frog lineages. Comparing conserved orthologs annotated in these genomes revealed a strong correlation between genome size and gene size. To explore the cause of gene-size variation, we quantified the location of transposable elements relative to gene features and find that the accumulation of transposable elements in introns has played an important role in the evolution of gene size in D. tinctorius, while estimates of insertion times suggest that many insertion events are recent and species-specific. Finally, we carry out population-scale mobile-element sequencing and show that the diversity and abundance of transposable elements in poison frog genomes can complicate genotyping from repetitive element sequence anchors. Our results show that transposable elements have clearly played an important role in the evolution of large genome size in D. tinctorius. Future studies are needed to fully understand the dynamics of transposable element evolution and to optimize primer or bait design for cost-effective population-level genotyping in species with large, repetitive genomes.

Assuntos

Anuros , Elementos de DNA Transponíveis , Evolução Molecular , Tamanho do Genoma , Genoma , Animais , Anuros/genética , Rãs Venenosas

15.

NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads.

Hu, Jiang; Wang, Zhuo; Sun, Zongyi; Hu, Benxia; Ayoola, Adeola Oluwakemi; Liang, Fan; Li, Jingjing; Sandoval, José R; Cooper, David N; Ye, Kai; Ruan, Jue; Xiao, Chuan-Le; Wang, Depeng; Wu, Dong-Dong; Wang, Sheng.

Genome Biol ; 25(1): 107, 2024 04 26.

Artigo em Inglês | MEDLINE | ID: mdl-38671502

RESUMO

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.

Assuntos

Variações do Número de Cópias de DNA , Genoma Humano , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Genômica/métodos

16.

Advancements in long-read genome sequencing technologies and algorithms.

Espinosa, Elena; Bautista, Rocio; Larrosa, Rafael; Plata, Oscar.

Genomics ; 116(3): 110842, 2024 05.

Artigo em Inglês | MEDLINE | ID: mdl-38608738

RESUMO

The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Humanos , Sequenciamento Completo do Genoma/métodos

17.

The invasive land flatworm Arthurdendyus triangulatus has repeated sequences in the mitogenome, extra-long cox2 gene and paralogous nuclear rRNA clusters.

Gastineau, Romain; Lemieux, Claude; Turmel, Monique; Otis, Christian; Boyle, Brian; Coulis, Mathieu; Gouraud, Clément; Boag, Brian; Murchie, Archie K; Winsor, Leigh; Justine, Jean-Lou.

Sci Rep ; 14(1): 7840, 2024 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-38570596

RESUMO

Using a combination of short- and long-reads sequencing, we were able to sequence the complete mitochondrial genome of the invasive 'New Zealand flatworm' Arthurdendyus triangulatus (Geoplanidae, Rhynchodeminae, Caenoplanini) and its two complete paralogous nuclear rRNA gene clusters. The mitogenome has a total length of 20,309 bp and contains repetitions that includes two types of tandem-repeats that could not be solved by short-reads sequencing. We also sequenced for the first time the mitogenomes of four species of Caenoplana (Caenoplanini). A maximum likelihood phylogeny associated A. triangulatus with the other Caenoplanini but Parakontikia ventrolineata and Australopacifica atrata were rejected from the Caenoplanini and associated instead with the Rhynchodemini, with Platydemus manokwari. It was found that the mitogenomes of all species of the subfamily Rhynchodeminae share several unusual structural features, including a very long cox2 gene. This is the first time that the complete paralogous rRNA clusters, which differ in length, sequence and seemingly number of copies, were obtained for a Geoplanidae.

Assuntos

Genoma Mitocondrial , Platelmintos , Animais , Platelmintos/genética , Genoma Mitocondrial/genética , Sequências Repetitivas de Ácido Nucleico , Filogenia , Análise de Sequência de DNA , RNA Ribossômico/genética

18.

The Application of Long-Read Sequencing to Cancer.

Ermini, Luca; Driguez, Patrick.

Cancers (Basel) ; 16(7)2024 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-38610953

RESUMO

Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.

19.

Improved genome assembly of the whiteleg shrimp Penaeus (Litopenaeus) vannamei using long- and short-read sequences from public databases.

Perez-Enriquez, Ricardo; Juárez, Oscar E; Galindo-Torres, Pavel; Vargas-Aguilar, Ana Luisa; Llera-Herrera, Raúl.

J Hered ; 115(3): 302-310, 2024 May 09.

Artigo em Inglês | MEDLINE | ID: mdl-38451162

RESUMO

The Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei is a highly relevant species for the world's aquaculture development, for which an incomplete genome is available in public databases. In this work, PacBio long-reads from 14 publicly available genomic libraries (131.2 Gb) were mined to improve the reference genome assembly. The libraries were assembled, polished using Illumina short-reads, and scaffolded with P. vannamei, Feneropenaeus chinensis, and Penaeus monodon genomes. The reference-guided assembly, organized into 44 pseudo-chromosomes and 15,682 scaffolds, showed an improvement from previous reference genomes with a genome size of 2.055 Gb, N50 of 40.14 Mb, L50 of 21, and the longest scaffold of 65.79 Mb. Most orthologous genes (92.6%) of the Arthropoda_odb10 database were detected as "complete," and BRAKER predicted 21,816 gene models; from these, we detected 1,814 single-copy orthologues conserved across the genomic references for Marsupenaeus japonicus, F. chinensis, and P. monodon. Transcriptomic-assembly data aligned in more than 99% to the new reference-guided assembly. The collinearity analysis of the assembled pseudo-chromosomes against the P. vannamei and P. monodon reference genomes showed high conservation in different sets of pseudo-chromosomes. In addition, more than 21,000 publicly available genetic marker sequences were mapped to single-site positions. This new assembly represents a step forward to previously reported P. vannamei assemblies. It will be helpful as a reference genome for future studies on the evolutionary history of the species, the genetic architecture of physiological and sex-determination traits, and the analysis of the changes in genetic diversity and composition of cultivated stocks.

Assuntos

Genoma , Penaeidae , Penaeidae/genética , Animais , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular

20.

ONT read assembly of the black rhino genome.

Kraaijeveld, Ken; Bossers, Koen; Petrusevski, Nikola; Pieterman, Stef; Bruins-van Sonsbeek, Linda G R; Wittink, Floyd.

BMC Genom Data ; 25(1): 27, 2024 Mar 05.

Artigo em Inglês | MEDLINE | ID: mdl-38443836

RESUMO

OBJECTIVES: The black rhinoceros (Diceros bicornis) is an endangered mammal for which a captive breeding program is part of the conservation effort. Black rhinos in zoo's often suffer from chronic infections and heamochromatosis. Furthermore, breeding is hampered by low male fertility. To aid a research project studying these topics, we sequenced and assembled the genome of a captive male black rhino using ONT sequencing data only. DATA DESCRIPTION: This work produced over 100 Gb whole genome sequencing reads from whole blood. These were assembled into a 2.47 Gb draft genome consisting of 834 contigs with an N50 of 29.53 Mb. The genome annotation was lifted over from an available genome annotation for black rhino, which resulted in the retrieval of over 99% of gene features. This new genome assembly will be a valuable resource in for conservation genetic research in this species.

Assuntos

Pesquisa em Genética , Nariz , Masculino , Animais , Perissodáctilos/genética , Infecção Persistente , Projetos de Pesquisa

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA