Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
G3 (Bethesda) ; 10(6): 1823-1827, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32241919

RESUMO

Barley (Hordeum vulgare) is one of the most important crops worldwide and is also considered a research model for the large-genome small grain temperate cereals. Despite genomic resources improving all the time, they are limited for the cv Golden Promise, the most efficient genotype for genetic transformation. We have developed a barley cv Golden Promise reference assembly integrating Illumina paired-end reads, long mate-pair reads, Dovetail Chicago in vitro proximity ligation libraries and chromosome conformation capture sequencing (Hi-C) libraries into a contiguous reference assembly. The assembled genome of 7 chromosomes and 4.13Gb in size, has a super-scaffold N50 after Chicago libraries of 4.14Mb and contains only 2.2% gaps. Using BUSCO (benchmarking universal single copy orthologous genes) as evaluation the genome assembly contains 95.2% of complete and single copy genes from the plant database. A high-quality Golden Promise reference assembly will be useful and utilized by the whole barley research community but will prove particularly useful for CRISPR-Cas9 experiments.


Assuntos
Hordeum , Genoma , Genômica , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Hordeum/genética
2.
F1000Res ; 8: 1490, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31723420

RESUMO

The Sequence Distance Graph (SDG) framework works with genome assembly graphs and raw data from paired, linked and long reads. It includes a simple deBruijn graph module, and can import graphs using the graphical fragment assembly (GFA) format. It also maps raw reads onto graphs, and provides a Python application programming interface (API) to navigate the graph, access the mapped and raw data and perform interactive or scripted analyses. Its complete workspace can be dumped to and loaded from disk, decoupling mapping from analysis and supporting multi-stage pipelines. We present the design and implementation of the framework, and example analyses scaffolding a short read graph with long reads, and navigating paths in a heterozygous graph for a simulated parent-offspring trio dataset. SDG  is  freely  available  under  the  MIT  license  at https://github.com/bioinfologics/sdg.


Assuntos
Análise de Sequência de DNA , Software , Genômica
3.
Genome Biol ; 20(1): 69, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30982471

RESUMO

BACKGROUND: Sequence exchange between homologous chromosomes through crossing over and gene conversion is highly conserved among eukaryotes, contributing to genome stability and genetic diversity. A lack of recombination limits breeding efforts in crops; therefore, increasing recombination rates can reduce linkage drag and generate new genetic combinations. RESULTS: We use computational analysis of 13 recombinant inbred mapping populations to assess crossover and gene conversion frequency in the hexaploid genome of wheat (Triticum aestivum). We observe that high-frequency crossover sites are shared between populations and that closely related parents lead to populations with more similar crossover patterns. We demonstrate that gene conversion is more prevalent and covers more of the genome in wheat than in other plants, making it a critical process in the generation of new haplotypes, particularly in centromeric regions where crossovers are rare. We identify quantitative trait loci for altered gene conversion and crossover frequency and confirm functionality for a novel RecQ helicase gene that belongs to an ancient clade that is missing in some plant lineages including Arabidopsis. CONCLUSIONS: This is the first gene to be demonstrated to be involved in gene conversion in wheat. Harnessing the RecQ helicase has the potential to break linkage drag utilizing widespread gene conversions.


Assuntos
Troca Genética , Conversão Gênica , Triticum/genética , Genoma de Planta , Poliploidia , Sequenciamento Completo do Genoma
4.
Nat Ecol Evol ; 2(6): 1000-1008, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29686237

RESUMO

Accelerating international trade and climate change make pathogen spread an increasing concern. Hymenoscyphus fraxineus, the causal agent of ash dieback, is a fungal pathogen that has been moving across continents and hosts from Asian to European ash. Most European common ash trees (Fraxinus excelsior) are highly susceptible to H. fraxineus, although a minority (~5%) have partial resistance to dieback. Here, we assemble and annotate a H. fraxineus draft genome, which approaches chromosome scale. Pathogen genetic diversity across Europe and in Japan, reveals a strong bottleneck in Europe, though a signal of adaptive diversity remains in key host interaction genes. We find that the European population was founded by two divergent haploid individuals. Divergence between these haplotypes represents the ancestral polymorphism within a large source population. Subsequent introduction from this source would greatly increase adaptive potential of the pathogen. Thus, further introgression of H. fraxineus into Europe represents a potential threat and Europe-wide biological security measures are needed to manage this disease.


Assuntos
Ascomicetos/genética , Fraxinus/microbiologia , Genoma Fúngico , Doenças das Plantas/microbiologia , Europa (Continente) , Haplótipos/genética
5.
Gigascience ; 6(11): 1-7, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29069494

RESUMO

Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components.


Assuntos
Genoma de Planta , Triticum/genética , Anotação de Sequência Molecular , Poliploidia , Sequenciamento Completo do Genoma
6.
Genome Res ; 27(5): 885-896, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28420692

RESUMO

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Anotação de Sequência Molecular/métodos , Proteínas de Plantas/genética , Translocação Genética , Triticum/genética , Algoritmos , Mapeamento de Sequências Contíguas/normas , Anotação de Sequência Molecular/normas , Polimorfismo Genético , Poliploidia
7.
Nature ; 541(7636): 212-216, 2017 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-28024298

RESUMO

Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low-heterozygosity Fraxinus excelsior tree from Gloucestershire, UK, annotating 38,852 protein-coding genes of which 25% appear ash specific when compared with the genomes of ten other plant species. Analyses of paralogous genes suggest a whole-genome duplication shared with olive (Olea europaea, Oleaceae). We also re-sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British populations suggest that reduced susceptibility to ash dieback may be more widespread in Great Britain than in Denmark. We also present evidence that susceptibility of trees to H. fraxineus is associated with their iridoid glycoside levels. This rapid, integrated, multidisciplinary research response to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic.


Assuntos
Fraxinus/genética , Predisposição Genética para Doença/genética , Variação Genética , Genoma de Planta/genética , Doenças das Plantas/genética , Árvores/genética , Ascomicetos/patogenicidade , Sequência Conservada/genética , Dinamarca , Fraxinus/microbiologia , Genes de Plantas/genética , Genômica , Glicosídeos Iridoides/metabolismo , Doenças das Plantas/microbiologia , Doenças das Plantas/prevenção & controle , Proteínas de Plantas/genética , Densidade Demográfica , Análise de Sequência de DNA , Especificidade da Espécie , Transcriptoma , Árvores/microbiologia , Reino Unido
8.
Bioinformatics ; 33(4): 574-576, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27797770

RESUMO

Motivation: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. Results: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT's ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies. Availability and Implementation: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT . Contact: bernardo.clavijo@earlham.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala/normas , Controle de Qualidade , Análise de Sequência de DNA/normas , Software , Fraxinus/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
9.
Plant J ; 82(4): 680-92, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25759247

RESUMO

The medicinal plant Madagascar periwinkle, Catharanthus roseus (L.) G. Don, produces hundreds of biologically active monoterpene-derived indole alkaloid (MIA) metabolites and is the sole source of the potent, expensive anti-cancer compounds vinblastine and vincristine. Access to a genome sequence would enable insights into the biochemistry, control, and evolution of genes responsible for MIA biosynthesis. However, generation of a near-complete, scaffolded genome is prohibitive to small research communities due to the expense, time, and expertise required. In this study, we generated a genome assembly for C. roseus that provides a near-comprehensive representation of the genic space that revealed the genomic context of key points within the MIA biosynthetic pathway including physically clustered genes, tandem gene duplication, expression sub-functionalization, and putative neo-functionalization. The genome sequence also facilitated high resolution co-expression analyses that revealed three distinct clusters of co-expression within the components of the MIA pathway. Coordinated biosynthesis of precursors and intermediates throughout the pathway appear to be a feature of vinblastine/vincristine biosynthesis. The C. roseus genome also revealed localization of enzyme-rich genic regions and transporters near known biosynthetic enzymes, highlighting how even a draft genome sequence can empower the study of high-value specialized metabolites.


Assuntos
Produtos Biológicos/metabolismo , Catharanthus/metabolismo , Regulação da Expressão Gênica de Plantas , Genoma de Planta/genética , Vimblastina/metabolismo
10.
Bioinformatics ; 30(4): 566-8, 2014 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-24297520

RESUMO

SUMMARY: Illumina's recently released Nextera Long Mate Pair (LMP) kit enables production of jumping libraries of up to 12 kb. The LMP libraries are an invaluable resource for carrying out complex assemblies and other downstream bioinformatics analyses such as the characterization of structural variants. However, LMP libraries are intrinsically noisy and to maximize their value, post-sequencing data analysis is required. Standardizing laboratory protocols and the selection of sequenced reads for downstream analysis are non-trivial tasks. NextClip is a tool for analyzing reads from LMP libraries, generating a comprehensive quality report and extracting good quality trimmed and deduplicated reads. AVAILABILITY AND IMPLEMENTATION: Source code, user guide and example data are available from https://github.com/richardmleggett/nextclip/.


Assuntos
Proteínas de Arabidopsis/genética , Biologia Computacional/métodos , Biblioteca Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Arabidopsis/genética
11.
Brief Bioinform ; 14(5): 548-55, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23793381

RESUMO

Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.


Assuntos
Biologia Computacional/educação , Instrução por Computador/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Comportamento Cooperativo , Internet , Ensino
12.
Front Genet ; 4: 288, 2013 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-24381581

RESUMO

The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC). Unlike other sequencing centers that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform Quality Control (QC) bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA