Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 201
Filtrar
Más filtros

Intervalo de año de publicación
1.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37529913

RESUMEN

MOTIVATION: Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown. RESULTS: We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3rd-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the propor tion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3rd-ChimeraMiner can help to quantify and reduce the influence of chimeras. AVAILABILITY AND IMPLEMENTATION: The 3rd-ChimeraMiner is available on GitHub, https://github.com/dulunar/3rdChimeraMiner.


Asunto(s)
Biología Computacional , Genoma , Análisis de Secuencia de ADN/métodos , ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
2.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36804804

RESUMEN

Recent technological and computational advances have made metagenomic assembly a viable approach to achieving high-resolution views of complex microbial communities. In previous benchmarking, short-read (SR) metagenomic assemblers had the highest accuracy, long-read (LR) assemblers generated the most contiguous sequences and hybrid (HY) assemblers balanced length and accuracy. However, no assessments have specifically compared the performance of these assemblers on low-abundance species, which include clinically relevant organisms in the gut. We generated semi-synthetic LR and SR datasets by spiking small and increasing amounts of Escherichia coli isolate reads into fecal metagenomes and, using different assemblers, examined E. coli contigs and the presence of antibiotic resistance genes (ARGs). For ARG assembly, although SR assemblers recovered more ARGs with high accuracy, even at low coverages, LR assemblies allowed for the placement of ARGs within longer, E. coli-specific contigs, thus pinpointing their taxonomic origin. HY assemblies identified resistance genes with high accuracy and had lower contiguity than LR assemblies. Each assembler type's strengths were maintained even when our isolate was spiked in with a competing strain, which fragmented and reduced the accuracy of all assemblies. For strain characterization and determining gene context, LR assembly is optimal, while for base-accurate gene identification, SR assemblers outperform other options. HY assembly offers contiguity and base accuracy, but requires generating data on multiple platforms, and may suffer high misassembly rates when strain diversity exists. Our results highlight the trade-offs associated with each approach for recovering low-abundance taxa, and that the optimal approach is goal-dependent.


Asunto(s)
Metagenoma , Microbiota , Análisis de Secuencia de ADN/métodos , Escherichia coli/genética , Microbiota/genética , Metagenómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
3.
Genomics ; 116(3): 110842, 2024 05.
Artículo en Inglés | MEDLINE | ID: mdl-38608738

RESUMEN

The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Secuenciación Completa del Genoma/métodos
4.
BMC Bioinformatics ; 25(1): 194, 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38755561

RESUMEN

Telomeres are regions of repetitive DNA at the ends of linear chromosomes which protect chromosome ends from degradation. Telomere lengths have been extensively studied in the context of aging and disease, though most studies use average telomere lengths which are of limited utility. We present a method for identifying all 92 telomere alleles from long read sequencing data. Individual telomeres are identified using variant repeats proximal to telomere regions, which are unique across alleles. This high-throughput and high-resolution characterization of telomeres could be foundational to future studies investigating the roles of specific telomeres in aging and disease.


Asunto(s)
Alelos , Telómero , Telómero/genética , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuencias Repetitivas de Ácidos Nucleicos/genética
5.
BMC Bioinformatics ; 25(1): 186, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730374

RESUMEN

BACKGROUND: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. RESULTS: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. CONCLUSION: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Alineación de Secuencia/métodos
6.
BMC Genomics ; 25(1): 459, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730342

RESUMEN

BACKGROUND: Genome-wide comparisons of populations are widely used to explore the patterns of nucleotide diversity and sequence divergence to provide knowledge on how natural selection and genetic drift affect the genome. In this study we have compared whole-genome sequencing data from Atlantic and Pacific herring, two sister species that diverged about 2 million years ago, to explore the pattern of genetic differentiation between the two species. RESULTS: The genome comparison of the two species revealed high genome-wide differentiation but with islands of remarkably low genetic differentiation, as measured by an FST analysis. However, the low FST observed in these islands is not caused by low interspecies sequence divergence (dxy) but rather by exceptionally high estimated intraspecies nucleotide diversity (π). These regions of low differentiation and elevated nucleotide diversity, termed high-diversity regions in this study, are not enriched for repeats but are highly enriched for immune-related genes. This enrichment includes genes from both the adaptive immune system, such as immunoglobulin, T-cell receptor and major histocompatibility complex genes, as well as a substantial number of genes with a role in the innate immune system, e.g. novel immune-type receptor, tripartite motif and tumor necrosis factor receptor genes. Analysis of long-read based assemblies from two Atlantic herring individuals revealed extensive copy number variation in these genomic regions, indicating that the elevated intraspecies nucleotide diversities were partially due to the cross-mapping of short reads. CONCLUSIONS: This study demonstrates that copy number variation is a characteristic feature of immune trait loci in herring. Another important implication is that these loci are blind spots in classical genome-wide screens for genetic differentiation using short-read data, not only in herring, likely also in other species harboring qualitatively similar variation at immune trait loci. These loci stood out in this study because of the relatively high genome-wide baseline for FST values between Atlantic and Pacific herring.


Asunto(s)
Variaciones en el Número de Copia de ADN , Peces , Animales , Peces/genética , Peces/inmunología , Variación Genética , Océano Atlántico , Sitios de Carácter Cuantitativo , Secuenciación Completa del Genoma
7.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34619757

RESUMEN

Long-read sequencing technology enables significant progress in de novo genome assembly. However, the high error rate and the wide error distribution of raw reads result in a large number of errors in the assembly. Polishing is a procedure to fix errors in the draft assembly and improve the reliability of genomic analysis. However, existing methods treat all the regions of the assembly equally while there are fundamental differences between the error distributions of these regions. How to achieve very high accuracy in genome assembly is still a challenging problem. Motivated by the uneven errors in different regions of the assembly, we propose a novel polishing workflow named BlockPolish. In this method, we divide contigs into blocks with low complexity and high complexity according to statistics of aligned nucleotide bases. Multiple sequence alignment is applied to realign raw reads in complex blocks and optimize the alignment result. Due to the different distributions of error rates in trivial and complex blocks, two multitask bidirectional Long short-term memory (LSTM) networks are proposed to predict the consensus sequences. In the whole-genome assemblies of NA12878 assembled by Wtdbg2 and Flye using Nanopore data, BlockPolish has a higher polishing accuracy than other state-of-the-arts including Racon, Medaka and MarginPolish & HELEN. In all assemblies, errors are predominantly indels and BlockPolish has a good performance in correcting them. In addition to the Nanopore assemblies, we further demonstrate that BlockPolish can also reduce the errors in the PacBio assemblies. The source code of BlockPolish is freely available on Github (https://github.com/huangnengCSU/BlockPolish).


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reproducibilidad de los Resultados , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos
8.
Hum Genomics ; 17(1): 21, 2023 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-36895025

RESUMEN

BACKGROUND: Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, the characterization of repetitive sequences by reconstructing genomic structures at high resolution solely from long reads remains difficult. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads. METHODS: We developed LoMA by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data. RESULTS: The assessment of LoMA showed a high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to a previous study. The genome-wide analysis of NA18943 and NA19240 identified 5516 and 6542 insertions (≥ 100 bp), respectively. Most insertions (~ 80%) were derived from tandem repeats and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Finally, our analysis suggested that short tandem duplications are associated with gene expression and transposons. CONCLUSIONS: Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of the insertions with high accuracy and inferred the mechanisms for the insertions, thus contributing to future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma .


Asunto(s)
Elementos Transponibles de ADN , Genoma Humano , Humanos , Análisis de Secuencia de ADN/métodos , Genoma Humano/genética , Elementos Transponibles de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica
9.
J Hered ; 115(3): 302-310, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38451162

RESUMEN

The Pacific whiteleg shrimp Penaeus (Litopenaeus) vannamei is a highly relevant species for the world's aquaculture development, for which an incomplete genome is available in public databases. In this work, PacBio long-reads from 14 publicly available genomic libraries (131.2 Gb) were mined to improve the reference genome assembly. The libraries were assembled, polished using Illumina short-reads, and scaffolded with P. vannamei, Feneropenaeus chinensis, and Penaeus monodon genomes. The reference-guided assembly, organized into 44 pseudo-chromosomes and 15,682 scaffolds, showed an improvement from previous reference genomes with a genome size of 2.055 Gb, N50 of 40.14 Mb, L50 of 21, and the longest scaffold of 65.79 Mb. Most orthologous genes (92.6%) of the Arthropoda_odb10 database were detected as "complete," and BRAKER predicted 21,816 gene models; from these, we detected 1,814 single-copy orthologues conserved across the genomic references for Marsupenaeus japonicus, F. chinensis, and P. monodon. Transcriptomic-assembly data aligned in more than 99% to the new reference-guided assembly. The collinearity analysis of the assembled pseudo-chromosomes against the P. vannamei and P. monodon reference genomes showed high conservation in different sets of pseudo-chromosomes. In addition, more than 21,000 publicly available genetic marker sequences were mapped to single-site positions. This new assembly represents a step forward to previously reported P. vannamei assemblies. It will be helpful as a reference genome for future studies on the evolutionary history of the species, the genetic architecture of physiological and sex-determination traits, and the analysis of the changes in genetic diversity and composition of cultivated stocks.


Asunto(s)
Genoma , Penaeidae , Penaeidae/genética , Animales , Bases de Datos Genéticas , Genómica/métodos , Anotación de Secuencia Molecular
10.
Proc Natl Acad Sci U S A ; 118(6)2021 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-33526659

RESUMEN

It is well established that plasmids play an important role in the dissemination of antimicrobial resistance (AMR) genes; however, little is known about the role of the underlying interactions between different plasmid categories and other mobile genetic elements (MGEs) in shaping the promiscuous spread of AMR genes. Here, we developed a tool designed for plasmid classification, AMR gene annotation, and plasmid visualization and found that most plasmid-borne AMR genes, including those localized on class 1 integrons, are enriched in conjugative plasmids. Notably, we report the discovery and characterization of a massive insertion sequence (IS)-associated AMR gene transfer network (245 combinations covering 59 AMR gene subtypes and 53 ISs) linking conjugative plasmids and phylogenetically distant pathogens, suggesting a general evolutionary mechanism for the horizontal transfer of AMR genes mediated by the interaction between conjugative plasmids and ISs. Moreover, our experimental results confirmed the importance of the observed interactions in aiding the horizontal transfer and expanding the genetic range of AMR genes within complex microbial communities.


Asunto(s)
Conjugación Genética , Farmacorresistencia Bacteriana/genética , Transferencia de Gen Horizontal/genética , Genes Bacterianos , Mutagénesis Insercional/genética , Plásmidos/genética , Cromosomas Bacterianos/genética , Mosaicismo , Filogenia , Sintenía/genética
11.
J Nematol ; 56(1): 20240029, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-39221107

RESUMEN

The hop cyst nematode, Heterodera humuli, is the most common plant-parasitic nematode associated with hop worldwide. This study reports the draft genome of H. humuli generated on the PacBio Sequel IIe System with the ultra-low DNA input HiFi sequencing method, and the corresponding genome annotation. This genome resource will help further studies on H. humuli and other cyst nematodes.

12.
BMC Bioinformatics ; 24(1): 288, 2023 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-37464285

RESUMEN

BACKGROUND:  PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing. RESULTS:  MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats. CONCLUSIONS:  MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).


Asunto(s)
Genoma Mitocondrial , Filogenia , ARN , Eucariontes , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento
13.
BMC Bioinformatics ; 24(1): 119, 2023 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-36977976

RESUMEN

BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS: In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV . CONCLUSIONS: The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Genoma , Redes Neurales de la Computación
14.
BMC Genomics ; 24(1): 727, 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38041056

RESUMEN

BACKGROUND: While genome-resolved metagenomics has revolutionized our understanding of microbial and genetic diversity in environmental samples, assemblies of short-reads often result in incomplete and/or highly fragmented metagenome-assembled genomes (MAGs), hampering in-depth genomics. Although Nanopore sequencing has increasingly been used in microbial metagenomics as long reads greatly improve the assembly quality of MAGs, the recommended DNA quantity usually exceeds the recoverable amount of DNA of environmental samples. Here, we evaluated lower-than-recommended DNA quantities for Nanopore library preparation by determining sequencing quality, community composition, assembly quality and recovery of MAGs. RESULTS: We generated 27 Nanopore metagenomes using the commercially available ZYMO mock community and varied the amount of input DNA from 1000 ng (the recommended minimum) down to 1 ng in eight steps. The quality of the generated reads remained stable across all input levels. The read mapping accuracy, which reflects how well the reads match a known reference genome, was consistently high across all libraries. The relative abundance of the species in the metagenomes was stable down to input levels of 50 ng. High-quality MAGs (> 95% completeness, ≤ 5% contamination) could be recovered from metagenomes down to 35 ng of input material. When combined with publicly available Illumina reads for the mock community, Nanopore reads from input quantities as low as 1 ng improved the quality of hybrid assemblies. CONCLUSION: Our results show that the recommended DNA amount for Nanopore library preparation can be substantially reduced without any adverse effects to genome recovery and still bolster hybrid assemblies when combined with short-read data. We posit that the results presented herein will enable studies to improve genome recovery from low-biomass environments, enhancing microbiome understanding.


Asunto(s)
Baile , Nanoporos , Análisis de Secuencia de ADN/métodos , Metagenómica/métodos , Metagenoma , Genoma Bacteriano , ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
15.
BMC Genomics ; 24(1): 148, 2023 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-36973656

RESUMEN

BACKGROUND: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. RESULTS: We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. CONCLUSIONS: By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level.


Asunto(s)
Benchmarking , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
16.
Plant Biotechnol J ; 21(6): 1240-1253, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36807472

RESUMEN

Rapid adaptation of weeds to herbicide applications in agriculture through resistance development is a widespread phenomenon. In particular, the grass Alopecurus myosuroides is an extremely problematic weed in cereal crops with the potential to manifest resistance in only a few generations. Target-site resistances (TSRs), with their strong phenotypic response, play an important role in this rapid adaptive response. Recently, using PacBio's long-read amplicon sequencing technology in hundreds of individuals, we were able to decipher the genomic context in which TSR mutations occur. However, sequencing individual amplicons are costly and time-consuming, thus impractical to implement for other resistance loci or applications. Alternatively, pool-based approaches overcome these limitations and provide reliable allele frequencies, although at the expense of not preserving haplotype information. In this proof-of-concept study, we sequenced with PacBio High Fidelity (HiFi) reads long-range amplicons (13.2 kb), encompassing the entire ACCase gene in pools of over 100 individuals, and resolved them into haplotypes using the clustering algorithm PacBio amplicon analysis (pbaa), a new application for pools in plants and other organisms. From these amplicon pools, we were able to recover most haplotypes from previously sequenced individuals of the same population. In addition, we analysed new pools from a Germany-wide collection of A. myosuroides populations and found that TSR mutations originating from soft sweeps of independent origin were common. Forward-in-time simulations indicate that TSR haplotypes will persist for decades even at relatively low frequencies and without selection, highlighting the importance of accurate measurement of TSR haplotype prevalence for weed management.


Asunto(s)
Acetil-CoA Carboxilasa , Resistencia a los Herbicidas , Poaceae , Acetil-CoA Carboxilasa/genética , Agricultura , Frecuencia de los Genes/genética , Haplotipos/genética , Resistencia a los Herbicidas/genética , Herbicidas/farmacología , Mutación , Poaceae/genética
17.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33429431

RESUMEN

With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Mapeo Contig/métodos , Genoma , Porcinos/genética , Animales , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Masculino , Análisis de Secuencia de ADN , Programas Informáticos
18.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34453168

RESUMEN

Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.


Asunto(s)
Biología Computacional/métodos , Metagenoma , Metagenómica/métodos , Farmacorresistencia Microbiana , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
19.
New Phytol ; 238(3): 1245-1262, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36751914

RESUMEN

Fructans in angiosperms play essential roles in physiological functions and environmental adaptations. As a major source of industrial fructans (especially inulin-type), chicory (Cichorium intybus L.) is a model species for studying fructan biosynthesis. However, the genes underlying this process and their evolutionary history in angiosperms remain elusive. We combined multiple sequencing technologies to assemble and annotate the chicory genome and scan its (epi)genomic features, such as genomic components, DNA methylation, and three-dimensional (3D) structure. We also performed a comparative genomics analysis to uncover the associations between key traits and gene families. We achieved a nearly complete chicory genome assembly and found that continuous bursts of a few highly active retrotransposon families largely shaped the (epi)genomic characteristics. The highly methylated genome with its unique 3D structure potentially influences critical biological processes. Our comprehensive comparative genomics analysis deciphered the genetic basis for the rich sesquiterpene content in chicory and indicated that the fructan-accumulating trait resulted from convergent evolution in angiosperms due to shifts in critical sites of fructan-active enzymes. The highly characterized chicory genome provides insight into Asteraceae evolution and fructan biosynthesis in angiosperms.


Asunto(s)
Cichorium intybus , Fructanos , Magnoliopsida , Asteraceae/genética , Metabolismo de los Hidratos de Carbono , Cichorium intybus/genética , Fructanos/biosíntesis , Magnoliopsida/genética
20.
Mol Ecol ; 32(6): 1271-1287, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-35810343

RESUMEN

Synteny, the ordering of sequences within homologous chromosomes, must be maintained within the genomes of sexually reproducing species for the sharing of alleles and production of viable, reproducing offspring. However, when the genomes of closely related species are compared, a loss of synteny is often observed. Unequal homologous recombination is the primary mechanism behind synteny loss, occurring more often in transposon rich regions, and resulting in the formation of chromosomal rearrangements. To examine patterns of synteny among three closely related, interbreeding, and wild Eucalyptus species, we assembled their genomes using long-read DNA sequencing and de novo assembly. We identify syntenic and rearranged regions between these genomes and estimate that ~48% of our genomes remain syntenic while ~36% is rearranged. We observed that rearrangements highly fragment microsynteny. Our results suggest that synteny between these species is primarily lost through small-scale rearrangements, not through sequence loss, gain, or sequence divergence. Further examination of identified rearrangements suggests that rearrangements may be altering the phenotypes of Eucalyptus species. Our study also underscores that the use of single reference genomes in genomic variation studies could lead to reference bias, especially given the scale at which we show potentially adaptive loci have highly diverged, deleted, duplicated and/or rearranged. This study provides an unbiased framework to look at potential speciation and adaptive loci among a rapidly radiating foundation species of woodland trees that are free from selective breeding seen in most crop species.


Asunto(s)
Eucalyptus , Eucalyptus/genética , Genoma , Sintenía/genética , Cromosomas , Análisis de Secuencia de ADN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA