Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 406
Filtrar
1.
Bioinformatics ; 38(10): 2675-2682, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561180

RESUMO

MOTIVATION: Crucial to the correctness of a genome assembly is the accuracy of the underlying scaffolds that specify the orders and orientations of contigs together with the gap distances between contigs. The current methods construct scaffolds based on the alignments of 'linking' reads against contigs. We found that some 'optimal' alignments are mistaken due to factors such as the contig boundary effect, particularly in the presence of repeats. Occasionally, the incorrect alignments can even overwhelm the correct ones. The detection of the incorrect linking information is challenging in any existing methods. RESULTS: In this study, we present a novel scaffolding method RegScaf. It first examines the distribution of distances between contigs from read alignment by the kernel density. When multiple modes are shown in a density, orientation-supported links are grouped into clusters, each of which defines a linking distance corresponding to a mode. The linear model parameterizes contigs by their positions on the genome; then each linking distance between a pair of contigs is taken as an observation on the difference of their positions. The parameters are estimated by minimizing a global loss function, which is a version of trimmed sum of squares. The least trimmed squares estimate has such a high breakdown value that it can automatically remove the mistaken linking distances. The results on both synthetic and real datasets demonstrate that RegScaf outperforms some popular scaffolders, especially in the accuracy of gap estimates by substantially reducing extremely abnormal errors. Its strength in resolving repeat regions is exemplified by a real case. Its adaptability to large genomes and TGS long reads is validated as well. AVAILABILITY AND IMPLEMENTATION: RegScaf is publicly available at https://github.com/lemontealala/RegScaf.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Mapeamento de Sequências Contíguas/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
2.
Genome Biol ; 23(1): 29, 2022 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-35057847

RESUMO

Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Viral , Haplótipos , SARS-CoV-2/genética , Software , Benchmarking , COVID-19/virologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , SARS-CoV-2/classificação , Análise de Sequência de DNA
3.
Nucleic Acids Res ; 49(20): e117, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34417615

RESUMO

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Análise de Sequência de DNA/métodos , Software , Bactérias , Humanos , Funções Verossimilhança
4.
PLoS Genet ; 17(8): e1009705, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34437539

RESUMO

Whole-genome duplication and genome compaction are thought to have played important roles in teleost fish evolution. Ayu (or sweetfish), Plecoglossus altivelis, belongs to the superorder Stomiati, order Osmeriformes. Stomiati is phylogenetically classified as sister taxa of Neoteleostei. Thus, ayu holds an important position in the fish tree of life. Although ayu is economically important for the food industry and recreational fishing in Japan, few genomic resources are available for this species. To address this problem, we produced a draft genome sequence of ayu by whole-genome shotgun sequencing and constructed linkage maps using a genotyping-by-sequencing approach. Syntenic analyses of ayu and other teleost fish provided information about chromosomal rearrangements during the divergence of Stomiati, Protacanthopterygii and Neoteleostei. The size of the ayu genome indicates that genome compaction occurred after the divergence of the family Osmeridae. Ayu has an XX/XY sex-determination system for which we identified sex-associated loci by a genome-wide association study by genotyping-by-sequencing and whole-genome resequencing using wild populations. Genome-wide association mapping using wild ayu populations revealed three sex-linked scaffolds (total, 2.03 Mb). Comparison of whole-genome resequencing mapping coverage between males and females identified male-specific regions in sex-linked scaffolds. A duplicate copy of the anti-Müllerian hormone type-II receptor gene (amhr2bY) was found within these male-specific regions, distinct from the autosomal copy of amhr2. Expression of the Y-linked amhr2 gene was male-specific in sox9b-positive somatic cells surrounding germ cells in undifferentiated gonads, whereas autosomal amhr2 transcripts were detected in somatic cells in sexually undifferentiated gonads of both genetic males and females. Loss-of-function mutation for amhr2bY induced male to female sex reversal. Taken together with the known role of Amh and Amhr2 in sex differentiation, these results indicate that the paralog of amhr2 on the ayu Y chromosome determines genetic sex, and the male-specific amh-amhr2 pathway is critical for testicular differentiation in ayu.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Osmeriformes/genética , Receptores de Peptídeos/genética , Receptores de Fatores de Crescimento Transformadores beta/genética , Sequenciamento Completo do Genoma/métodos , Animais , Feminino , Proteínas de Peixes/genética , Mutação com Perda de Função , Masculino , Caracteres Sexuais , Sintenia
5.
Sci Rep ; 11(1): 14944, 2021 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-34294764

RESUMO

Picrorhiza kurrooa is an endangered medicinal herb which is distributed across the Himalayan region at an altitude between 3000-5000 m above mean sea level. The medicinal properties of P. kurrooa are attributed to monoterpenoid picrosides present in leaf, rhizome and root of the plant. However, no genomic information is currently available for P. kurrooa, which limits our understanding about its molecular systems and associated responses. The present study brings the first assembled draft genome of P. kurrooa by using 227 Gb of raw data generated by Illumina and PacBio RS II sequencing platforms. The assembled genome has a size of n = ~ 1.7 Gb with 12,924 scaffolds. Four pronged assembly quality validations studies, including experimentally reported ESTs mapping and directed sequencing of the assembled contigs, confirmed high reliability of the assembly. About 76% of the genome is covered by complex repeats alone. Annotation revealed 24,798 protein coding and 9789 non-coding genes. Using the assembled genome, a total of 710 miRNAs were discovered, many of which were found responsible for molecular response against temperature changes. The miRNAs and targets were validated experimentally. The availability of draft genome sequence will aid in genetic improvement and conservation of P. kurrooa. Also, this study provided an efficient approach for assembling complex genomes while dealing with repeats when regular assemblers failed to progress due to repeats.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Picrorhiza/genética , Análise de Sequência de DNA/métodos , Espécies em Perigo de Extinção , Tamanho do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Plantas Medicinais/genética
7.
Genes (Basel) ; 12(5)2021 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-33926025

RESUMO

Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow's ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.


Assuntos
Genoma Bacteriano , Controle de Qualidade , Sequenciamento Completo do Genoma/métodos , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Contaminação por DNA , Escherichia coli , Listeria monocytogenes , Salmonella enterica , Sensibilidade e Especificidade , Software , Especificidade da Espécie , Sequenciamento Completo do Genoma/normas , Fluxo de Trabalho
8.
PLoS One ; 16(4): e0249850, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33844699

RESUMO

In this article, we present QuASeR, a reference-free DNA sequence reconstruction implementation via de novo assembly on both gate-based and quantum annealing platforms. This is the first time this important application in bioinformatics is modeled using quantum computation. Each one of the four steps of the implementation (TSP, QUBO, Hamiltonians and QAOA) is explained with a proof-of-concept example to target both the genomics research community and quantum application developers in a self-contained manner. The implementation and results on executing the algorithm from a set of DNA reads to a reconstructed sequence, on a gate-based quantum simulator, the D-Wave quantum annealing simulator and hardware are detailed. We also highlight the limitations of current classical simulation and available quantum hardware systems. The implementation is open-source and can be found on https://github.com/QE-Lab/QuASeR.


Assuntos
Análise de Sequência de DNA/métodos , Software , Animais , Mapeamento de Sequências Contíguas/métodos , Humanos
9.
Nat Commun ; 12(1): 1935, 2021 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-33911078

RESUMO

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Bovinos , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Peixe-Zebra/genética
10.
Nat Genet ; 53(4): 574-584, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33737755

RESUMO

Rye is a valuable food and forage crop, an important genetic resource for wheat and triticale improvement and an indispensable material for efficient comparative genomic studies in grasses. Here, we sequenced the genome of Weining rye, an elite Chinese rye variety. The assembled contigs (7.74 Gb) accounted for 98.47% of the estimated genome size (7.86 Gb), with 93.67% of the contigs (7.25 Gb) assigned to seven chromosomes. Repetitive elements constituted 90.31% of the assembled genome. Compared to previously sequenced Triticeae genomes, Daniela, Sumaya and Sumana retrotransposons showed strong expansion in rye. Further analyses of the Weining assembly shed new light on genome-wide gene duplications and their impact on starch biosynthesis genes, physical organization of complex prolamin loci, gene expression features underlying early heading trait and putative domestication-associated chromosomal regions and loci in rye. This genome sequence promises to accelerate genomic and breeding studies in rye and related cereal crops.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Produtos Agrícolas/genética , Genoma de Planta , Proteínas de Plantas/genética , Característica Quantitativa Herdável , Secale/genética , Duplicação Gênica , Regulação da Expressão Gênica de Plantas , Loci Gênicos , Tamanho do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Melhoramento Vegetal , Proteínas de Plantas/metabolismo , Retroelementos , Amido/biossíntese , Triticum/genética
11.
Nat Commun ; 12(1): 1485, 2021 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-33674578

RESUMO

Yeast whole genome sequencing (WGS) lacks end-to-end workflows that identify genetic engineering. Here we present Prymetime, a tool that assembles yeast plasmids and chromosomes and annotates genetic engineering sequences. It is a hybrid workflow-it uses short and long reads as inputs to perform separate linear and circular assembly steps. This structure is necessary to accurately resolve genetic engineering sequences in plasmids and the genome. We show this by assembling diverse engineered yeasts, in some cases revealing unintended deletions and integrations. Furthermore, the resulting whole genomes are high quality, although the underlying assembly software does not consistently resolve highly repetitive genome features. Finally, we assemble plasmids and genome integrations from metagenomic sequencing, even with 1 engineered cell in 1000. This work is a blueprint for building WGS workflows and establishes WGS-based identification of yeast genetic engineering.


Assuntos
Engenharia Genética/métodos , Genoma Fúngico , Saccharomyces cerevisiae/genética , Sequenciamento Completo do Genoma/métodos , Sequência de Bases , Cromossomos , Cromossomos Artificiais de Levedura , Clonagem Molecular , Simulação por Computador , Mapeamento de Sequências Contíguas/métodos , Metagenoma , Metagenômica , Plasmídeos , Software , Transformação Genética
12.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33634311

RESUMO

In the field of genome assembly, scaffolding methods make it possible to obtain a more complete and contiguous reference genome, which is the cornerstone of genomic research. Scaffolding methods typically utilize the alignments between contigs and sequencing data (reads) to determine the orientation and order among contigs and to produce longer scaffolds, which are helpful for genomic downstream analysis. With the rapid development of high-throughput sequencing technologies, diverse types of reads have emerged over the past decade, especially in long-range sequencing, which have greatly enhanced the assembly quality of scaffolding methods. As the number of scaffolding methods increases, biology and bioinformatics researchers need to perform in-depth analyses of state-of-the-art scaffolding methods. In this article, we focus on the difficulties in scaffolding, the differences in characteristics among various kinds of reads, the methods by which current scaffolding methods address these difficulties, and future research opportunities. We hope this work will benefit the design of new scaffolding methods and the selection of appropriate scaffolding methods for specific biological studies.


Assuntos
Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genoma , Software , Animais , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
13.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33621981

RESUMO

Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). The method aligns the contigs and reads to the similar genome, and then extends and refines the aligned contigs with the aligned reads. Here, we introduce AlignGraph2, a similar genome-assisted reassembly pipeline for the PacBio long reads. The AlignGraph2 pipeline is the second version of AlignGraph algorithm proposed by us but completely redesigned, can be inputted with either error-prone or HiFi long reads, and contains four novel algorithms: similarity-aware alignment algorithm and alignment filtration algorithm for alignment of the long reads and preassembled contigs to the similar genome, and reassembly algorithm and weight-adjusted consensus algorithm for extension and refinement of the preassembled contigs. In our performance tests on both error-prone and HiFi long reads, AlignGraph2 can align 5.7-27.2% more long reads and 7.3-56.0% more bases than some current alignment algorithm and is more efficient or comparable to the others. For contigs assembled with various de novo algorithms and aligned to similar genomes (aligned contigs), AlignGraph2 can extend 8.7-94.7% of them (extendable contigs), and obtain contigs of 7.0-249.6% larger N50 value and 5.2-87.7% smaller number of indels per 100 kbp (extended contigs). With genomes of decreased similarities, AlignGraph2 also has relatively stable performance. The AlignGraph2 software can be downloaded for free from this site: https://github.com/huangs001/AlignGraph2.


Assuntos
Algoritmos , Arabidopsis/genética , Ilhas de CpG/genética , Genoma Fúngico , Genoma Humano , Genoma de Planta , Saccharomyces cerevisiae/genética , Alinhamento de Sequência/métodos , Software , Mapeamento de Sequências Contíguas/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Sequenciamento Completo do Genoma/métodos
14.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33429431

RESUMO

With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Mapeamento de Sequências Contíguas/métodos , Genoma , Suínos/genética , Animais , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Masculino , Análise de Sequência de DNA , Software
15.
Nat Biotechnol ; 39(4): 422-430, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33318652

RESUMO

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24-80.64 Mb), few assembly errors (contig NGA50: 11.8-59.59 Mb), good consensus quality (QV: 27.84-42.88) and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours: 187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).


Assuntos
Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Algoritmos , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
16.
BMC Genomics ; 21(1): 631, 2020 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-32928108

RESUMO

BACKGROUND: We benchmarked the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicycler for bacterial pathogens using Illumina and Oxford Nanopore sequencing by determining genome completeness and accuracy, antimicrobial resistance (AMR), virulence potential, multilocus sequence typing (MLST), phylogeny, and pan genome. Ten bacterial species (10 strains) were tested for simulated reads of both mediocre- and low-quality, whereas 11 bacterial species (12 strains) were tested for real reads. RESULTS: Unicycler performed the best for achieving contiguous genomes, closely followed by MaSuRCA, while all SPAdes assemblies were incomplete. MaSuRCA was less tolerant of low-quality long reads than SPAdes and Unicycler. The hybrid assemblies of five antimicrobial-resistant strains with simulated reads provided consistent AMR genotypes with the reference genomes. The MaSuRCA assembly of Staphylococcus aureus with real reads contained msr(A) and tet(K), while the reference genome and SPAdes and Unicycler assemblies harbored blaZ. The AMR genotypes of the reference genomes and hybrid assemblies were consistent for the other five antimicrobial-resistant strains with real reads. The numbers of virulence genes in all hybrid assemblies were similar to those of the reference genomes, irrespective of simulated or real reads. Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads. The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively. All approaches performed well in our MLST and phylogenetic analyses. The pan genomes of the hybrid assemblies of S. Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis. All approaches functioned well in the pan-genome analysis of Campylobacter jejuni with real reads. CONCLUSIONS: Our research demonstrates the hybrid assembly pipeline of Unicycler as a superior approach for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.


Assuntos
Genoma Bacteriano , Genômica/métodos , Sequenciamento por Nanoporos/métodos , Benchmarking , Campylobacter jejuni , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Cronobacter sakazakii , Farmacorresistência Bacteriana , Genômica/normas , Listeria monocytogenes , Sequenciamento por Nanoporos/normas , Pseudomonas aeruginosa , Salmonella typhimurium , Virulência
17.
Genetics ; 216(2): 599-608, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32796007

RESUMO

Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.


Assuntos
Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas/métodos , Dosagem de Genes , Triticum/genética , Mapeamento de Sequências Contíguas/normas , Genoma de Planta , Genômica/métodos , Genômica/normas , Padrões de Referência
18.
Genes (Basel) ; 11(7)2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-32679846

RESUMO

RPGR exon ORF15 variants are one of the most frequent causes for inherited retinal disorders (IRDs), in particular retinitis pigmentosa. The low sequence complexity of this mutation hotspot makes it prone to indels and challenging for sequence data analysis. Whole-exome sequencing generally fails to provide adequate coverage in this region. Therefore, complementary methods are needed to avoid false positives as well as negative results. In this study, next-generation sequencing (NGS) was used to sequence long-range PCR amplicons for an IRD cohort of African ancestry. By developing a novel secondary analysis pipeline based on de novo assembly, we were able to avoid the miscalling of variants generated by standard NGS analysis tools. We identified pathogenic variants in 11 patients (13% of the cohort), two of which have not been reported previously. We provide a novel and alternative end-to-end secondary analysis pipeline for targeted NGS of ORF15 that is less prone to false positive and negative variant calls.


Assuntos
Proteínas do Olho/genética , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Retinose Pigmentar/genética , População Negra/genética , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Éxons , Feminino , Testes Genéticos/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Povos Indígenas/genética , Masculino , Mutação , Linhagem , Retinose Pigmentar/diagnóstico , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas
19.
Microb Genom ; 6(10)2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32579097

RESUMO

Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced Escherichia coli isolates. Platon is available at: http://platon.computational.bio/.


Assuntos
Proteínas de Bactérias/genética , Biologia Computacional/métodos , Escherichia coli/genética , Genoma Bacteriano/genética , Plasmídeos/genética , Cromossomos Bacterianos/genética , Mapeamento de Sequências Contíguas/métodos , Farmacorresistência Bacteriana Múltipla/genética , Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento Completo do Genoma
20.
PLoS One ; 15(5): e0225808, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32396560

RESUMO

Peronospora effusa (previously known as P. farinosa f. sp. spinaciae, and here referred to as Pfs) is an obligate biotrophic oomycete that causes downy mildew on spinach (Spinacia oleracea). To combat this destructive many disease resistant cultivars have been bred and used. However, new Pfs races rapidly break the employed resistance genes. To get insight into the gene repertoire of Pfs and identify infection-related genes, the genome of the first reference race, Pfs1, was sequenced, assembled, and annotated. Due to the obligate biotrophic nature of this pathogen, material for DNA isolation can only be collected from infected spinach leaves that, however, also contain many other microorganisms. The obtained sequences can, therefore, be considered a metagenome. To filter and obtain Pfs sequences we utilized the CAT tool to taxonomically annotate ORFs residing on long sequences of a genome pre-assembly. This study is the first to show that CAT filtering performs well on eukaryotic contigs. Based on the taxonomy, determined on multiple ORFs, contaminating long sequences and corresponding reads were removed from the metagenome. Filtered reads were re-assembled to provide a clean and improved Pfs genome sequence of 32.4 Mbp consisting of 8,635 scaffolds. Transcript sequencing of a range of infection time points aided the prediction of a total of 13,277 gene models, including 99 RxLR(-like) effector, and 14 putative Crinkler genes. Comparative analysis identified common features in the predicted secretomes of different obligate biotrophic oomycetes, regardless of their phylogenetic distance. Their secretomes are generally smaller, compared to hemi-biotrophic and necrotrophic oomycete species. We observe a reduction in proteins involved in cell wall degradation, in Nep1-like proteins (NLPs), proteins with PAN/apple domains, and host translocated effectors. The genome of Pfs1 will be instrumental in studying downy mildew virulence and for understanding the molecular adaptations by which new isolates break spinach resistance.


Assuntos
Metagenoma , Peronospora/genética , Doenças das Plantas/microbiologia , Spinacia oleracea/microbiologia , Mapeamento de Sequências Contíguas/métodos , Peronospora/patogenicidade , Virulência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...