Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Brief Bioinform ; 20(2): 565-571, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-29659709

RESUMO

Improving productivity of the staple crops wheat and rice is essential to feed the growing global population, particularly in the context of a changing climate. However, current rates of yield gain are insufficient to support the predicted population growth. New approaches are required to accelerate the breeding process, and many of these are driven by the application of large-scale crop data. To leverage the substantial volumes and types of data that can be applied for precision breeding, the wheat and rice research communities are working towards the development of integrated systems to access and standardize the dispersed, heterogeneous available data. Here, we outline the initiatives of the International Wheat Information System (WheatIS) and the International Rice Informatics Consortium (IRIC) to establish Web-based single-access systems and data mining tools to make the available resources more accessible, drive discovery and accelerate the production of new crop varieties. We discuss the progress of WheatIS and IRIC towards unifying specialized wheat and rice databases and building custom software platforms to manage and interrogate these data. Single-access crop information systems will strengthen scientific collaboration, optimize the use of public research funds and help achieve the required yield gains in the two most important global food crops.


Assuntos
Produtos Agrícolas/crescimento & desenvolvimento , Sistemas de Informação , Oryza/crescimento & desenvolvimento , Triticum/crescimento & desenvolvimento
2.
Funct Integr Genomics ; 19(2): 363-371, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30483906

RESUMO

Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.


Assuntos
Perfilação da Expressão Gênica/métodos , Metagenoma , Análise de Sequência de RNA/métodos , Transcriptoma , Algoritmos , Animais , Antozoários/genética , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica/normas , Padrões de Referência , Análise de Sequência de RNA/normas
3.
Plant Biotechnol J ; 17(4): 789-800, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30230187

RESUMO

Brassica oleracea is an important agricultural species encompassing many vegetable crops including cabbage, cauliflower, broccoli and kale; however, it can be susceptible to a variety of fungal diseases such as clubroot, blackleg, leaf spot and downy mildew. Resistance to these diseases is meditated by specific disease resistance genes analogs (RGAs) which are differently distributed across B. oleracea lines. The sequenced reference cultivar does not contain all B. oleracea genes due to gene presence/absence variation between individuals, which makes it necessary to search for RGA candidates in the B. oleracea pangenome. Here we present a comparative analysis of RGA candidates in the pangenome of B. oleracea. We show that the presence of RGA candidates differs between lines and suggests that in B. oleracea, SNPs and presence/absence variation drive RGA diversity using separate mechanisms. We identified 59 RGA candidates linked to Sclerotinia, clubroot, and Fusarium wilt resistance QTL, and these findings have implications for crop breeding in B. oleracea, which may also be applicable in other crops species.


Assuntos
Ascomicetos/fisiologia , Brassica/genética , Resistência à Doença/genética , Fusarium/fisiologia , Genoma de Planta/genética , Doenças das Plantas/imunologia , Brassica/imunologia , Brassica/microbiologia , Produtos Agrícolas , Melhoramento Vegetal , Doenças das Plantas/microbiologia , Locos de Características Quantitativas/genética
4.
Plant J ; 90(5): 1007-1013, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28231383

RESUMO

There is an increasing understanding that variation in gene presence-absence plays an important role in the heritability of agronomic traits; however, there have been relatively few studies on variation in gene presence-absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic diversity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome diversity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene diversity across 18 wheat cultivars. We predict a pangenome size of 140 500 ± 102 genes, a core genome of 81 070 ± 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal single nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome diversity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at http://appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/, and data are available to download from http://wheatgenome.info/wheat_genome_databases.php.


Assuntos
Genoma de Planta/genética , Triticum/genética , Cromossomos de Plantas/genética , Variação Genética/genética , Polimorfismo de Nucleotídeo Único/genética
5.
Plant Biotechnol J ; 16(7): 1265-1274, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29205771

RESUMO

Homoeologous exchanges (HEs) have been shown to generate novel gene combinations and phenotypes in a range of polyploid species. Gene presence/absence variation (PAV) is also a major contributor to genetic diversity. In this study, we show that there is an association between these two events, particularly in recent Brassica napus synthetic accessions, and that these represent a novel source of genetic diversity, which can be captured for the improvement of this important crop species. By assembling the pangenome of B. napus, we show that 38% of the genes display PAV behaviour, with some of these variable genes predicted to be involved in important agronomic traits including flowering time, disease resistance, acyl lipid metabolism and glucosinolate metabolism. This study is a first and provides a detailed characterization of the association between HEs and PAVs in B. napus at the pangenome level.


Assuntos
Brassica napus/genética , Conversão Gênica/genética , Genes de Plantas/genética , Diploide , Deleção de Genes , Duplicação Gênica , Variação Genética/genética , Genoma de Planta/genética , Característica Quantitativa Herdável
6.
J Exp Bot ; 69(15): 3689-3702, 2018 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-29912443

RESUMO

Seagrasses are marine angiosperms that live fully submerged in the sea. They evolved from land plant ancestors, with multiple species representing at least three independent return-to-the-sea events. This raises the question of whether these marine angiosperms followed the same adaptation pathway to allow them to live and reproduce under the hostile marine conditions. To compare the basis of marine adaptation between seagrass lineages, we generated genomic data for Halophila ovalis and compared this with recently published genomes for two members of Zosteraceae, as well as genomes of five non-marine plant species (Arabidopsis, Oryza sativa, Phoenix dactylifera, Musa acuminata, and Spirodela polyrhiza). Halophila and Zosteraceae represent two independent seagrass lineages separated by around 30 million years. Genes that were lost or conserved in both lineages were identified. All three species lost genes associated with ethylene and terpenoid biosynthesis, and retained genes related to salinity adaptation, such as those for osmoregulation. In contrast, the loss of the NADH dehydrogenase-like complex is unique to H. ovalis. Through comparison of two independent return-to-the-sea events, this study further describes marine adaptation characteristics common to seagrass families, identifies species-specific gene loss, and provides molecular evidence for convergent evolution in seagrass lineages.


Assuntos
Evolução Molecular , Genômica , Hydrocharitaceae/genética , Magnoliopsida/genética , Zosteraceae/genética , Adaptação Fisiológica , Ecossistema , Especificidade da Espécie
7.
BMC Bioinformatics ; 18(1): 323, 2017 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-28666410

RESUMO

BACKGROUND: Reference genome assemblies are valuable, as they provide insights into gene content, genetic evolution and domestication. The higher the quality of a reference genome assembly the more accurate the downstream analysis will be. During the last few years, major efforts have been made towards improving the quality of genome assemblies. However, erroneous and incomplete assemblies are still common. Complementary to DNA sequencing technologies, optical mapping has advanced genomic studies by facilitating the production of genome scaffolds and assessing structural variation. However, there are few tools available to comprehensively examine misassemblies in reference genome sequences using optical map data. RESULTS: We present BioNanoAnalyst, a software package to examine genome assemblies based on restriction endonuclease cut sites and optical map data. A graphical user interface (GUI) allows users to assess reference genome sequences on different computer platforms without the requirement of programming knowledge. The zoom function makes visualisation convenient, while a GFF3 format output file gives an option to directly visualise questionable assembly regions by location and nucleotides following import into a local genome browser. CONCLUSIONS: BioNanoAnalyst is a tool to identify misassemblies in a reference genome sequence using optical map data. With the reported information, users can rapidly identify assembly errors and correct them using other software tools, which could facilitate an accurate downstream analysis.


Assuntos
Genômica , Interface Usuário-Computador , Cromossomos Humanos Par 1/genética , Cromossomos Humanos Par 1/metabolismo , Enzimas de Restrição do DNA/metabolismo , Genoma Humano , Humanos , Internet
8.
Plant Biotechnol J ; 15(12): 1602-1610, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28403535

RESUMO

As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome.


Assuntos
Genoma de Planta , Brassica napus/genética , Etiquetas de Sequências Expressas , Genes de Plantas , Anotação de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico
9.
Plant Physiol ; 172(1): 272-83, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27373688

RESUMO

Seagrasses are marine angiosperms that evolved from land plants but returned to the sea around 140 million years ago during the early evolution of monocotyledonous plants. They successfully adapted to abiotic stresses associated with growth in the marine environment, and today, seagrasses are distributed in coastal waters worldwide. Seagrass meadows are an important oceanic carbon sink and provide food and breeding grounds for diverse marine species. Here, we report the assembly and characterization of the Zostera muelleri genome, a southern hemisphere temperate species. Multiple genes were lost or modified in Z. muelleri compared with terrestrial or floating aquatic plants that are associated with their adaptation to life in the ocean. These include genes for hormone biosynthesis and signaling and cell wall catabolism. There is evidence of whole-genome duplication in Z. muelleri; however, an ancient pan-commelinid duplication event is absent, highlighting the early divergence of this species from the main monocot lineages.


Assuntos
Adaptação Fisiológica/genética , Ecossistema , Genoma de Planta/genética , Zosteraceae/genética , Organismos Aquáticos/genética , Duplicação Gênica , Ontologia Genética , Genes de Plantas/genética , Anotação de Sequência Molecular , Oceanos e Mares , Proteínas de Plantas/genética , Análise de Sequência de RNA
10.
Plant Biotechnol J ; 13(1): 97-104, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25147022

RESUMO

Despite being a major international crop, our understanding of the wheat genome is relatively poor due to its large size and complexity. To gain a greater understanding of wheat genome diversity, we have identified single nucleotide polymorphisms between 16 Australian bread wheat varieties. Whole-genome shotgun Illumina paired read sequence data were mapped to the draft assemblies of chromosomes 7A, 7B and 7D to identify more than 4 million intervarietal SNPs. SNP density varied between the three genomes, with much greater density observed on the A and B genomes than the D genome. This variation may be a result of substantial gene flow from the tetraploid Triticum turgidum, which possesses A and B genomes, during early co-cultivation of tetraploid and hexaploid wheat. In addition, we examined SNP density variation along the chromosome syntenic builds and identified genes in low-density regions which may have been selected during domestication and breeding. This study highlights the impact of evolution and breeding on the bread wheat genome and provides a substantial resource for trait association and crop improvement. All SNP data are publically available on a generic genome browser GBrowse at www.wheatgenome.info.


Assuntos
Pão , Cromossomos de Plantas/genética , Polimorfismo de Nucleotídeo Único/genética , Triticum/genética , Austrália , Genoma de Planta , Filogenia , Reprodutibilidade dos Testes
11.
J Exp Bot ; 66(5): 1489-98, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25563969

RESUMO

Seagrasses are flowering plants which grow fully submerged in the marine environment. They have evolved a range of adaptations to environmental challenges including light attenuation through water, the physical stress of wave action and tidal currents, high concentrations of salt, oxygen deficiency in marine sediment, and water-borne pollination. Although, seagrasses are a key stone species of the costal ecosystems, many questions regarding seagrass biology and evolution remain unanswered. Genome sequence data for the widespread Australian seagrass species Zostera muelleri were generated and the unassembled data were compared with the annotated genes of five sequenced plant species (Arabidopsis thaliana, Oryza sativa, Phoenix dactylifera, Musa acuminata, and Spirodela polyrhiza). Genes which are conserved between Z. muelleri and the five plant species were identified, together with genes that have been lost in Z. muelleri. The effect of gene loss on biological processes was assessed on the gene ontology classification level. Gene loss in Z. muelleri appears to influence some core biological processes such as ethylene biosynthesis. This study provides a foundation for further studies of seagrass evolution as well as the hormonal regulation of plant growth and development.


Assuntos
Etilenos/metabolismo , Genoma de Planta , Zosteraceae/genética , Ecossistema , Genômica , Fotossíntese , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Zosteraceae/metabolismo
12.
Theor Appl Genet ; 128(6): 1039-47, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25754422

RESUMO

KEY MESSAGE: We characterise the distribution of crossover and non-crossover recombination in Brassica napus and Cicer arietinum using a low-coverage genotyping by sequencing pipeline SkimGBS. The growth of next-generation DNA sequencing technologies has led to a rapid increase in sequence-based genotyping for applications including diversity assessment, genome structure validation and gene-trait association. We have established a skim-based genotyping by sequencing method for crop plants and applied this approach to genotype-segregating populations of Brassica napus and Cicer arietinum. Comparison of progeny genotypes with those of the parental individuals allowed the identification of crossover and non-crossover (gene conversion) events. Our results identify the positions of recombination events with high resolution, permitting the mapping and frequency assessment of recombination in segregating populations.


Assuntos
Brassica napus/genética , Cicer/genética , Troca Genética , Conversão Gênica , Técnicas de Genotipagem , Mapeamento Cromossômico , Genoma de Planta , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único
13.
BMC Genomics ; 15: 1052, 2014 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-25467196

RESUMO

BACKGROUND: Changes to the environment as a result of human activities can result in a range of impacts on reef building corals that include coral bleaching (reduced concentrations of algal symbionts), decreased coral growth and calcification, and increased incidence of diseases and mortality. Understanding how elevated temperatures and nutrient concentration affect early transcriptional changes in corals and their algal endosymbionts is critically important for evaluating the responses of coral reefs to global changes happening in the environment. Here, we investigated the expression of genes in colonies of the reef-building coral Acropora aspera exposed to short-term sub-lethal levels of thermal (+6°C) and nutrient stress (ammonium-enrichment: 20 µM). RESULTS: The RNA-Seq data provided hundreds of differentially expressed genes (DEGs) corresponding to various stress regimes, with 115 up- and 78 down-regulated genes common to all stress regimes. A list of DEGs included up-regulated coral genes like cytochrome c oxidase and NADH-ubiquinone oxidoreductase and up-regulated photosynthetic genes of algal origin, whereas coral GFP-like fluorescent chromoprotein and sodium/potassium-transporting ATPase showed reduced transcript levels. Taxonomic analyses of the coral holobiont disclosed the dominant presence of transcripts from coral (~70%) and Symbiodinium (~10-12%), as well as ~15-20% of unknown sequences which lacked sequence identity to known genes. Gene ontology analyses revealed enriched pathways, which led to changes in the dynamics of protein networks affecting growth, cellular processes, and energy requirement. CONCLUSIONS: In corals with preserved symbiont physiological performance (based on Fv/Fm, photo-pigment and symbiont density), transcriptomic changes and DEGs provided important insight into early stages of the stress response in the coral holobiont. Although there were no signs of coral bleaching after exposure to short-term thermal and nutrient stress conditions, we managed to detect oxidative stress and apoptotic changes on a molecular level and provide a list of prospective stress biomarkers for both partners in symbiosis. Consequently, our findings are important for understanding and anticipating impacts of anthropogenic global climate change on coral reefs.


Assuntos
Antozoários/genética , Regulação da Expressão Gênica , Estresse Fisiológico/genética , Transcrição Gênica , Animais , Antozoários/metabolismo , Biologia Computacional , Recifes de Corais , Metabolismo Energético , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Oxirredução , Fotossíntese , Temperatura
14.
Plant Biotechnol J ; 12(6): 778-86, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24702794

RESUMO

With the expansion of next-generation sequencing technology and advanced bioinformatics, there has been a rapid growth of genome sequencing projects. However, while this technology enables the rapid and cost-effective assembly of draft genomes, the quality of these assemblies usually falls short of gold standard genome assemblies produced using the more traditional BAC by BAC and Sanger sequencing approaches. Assembly validation is often performed by the physical anchoring of genetically mapped markers, but this is prone to errors and the resolution is usually low, especially towards centromeric regions where recombination is limited. New approaches are required to validate reference genome assemblies. The ability to isolate individual chromosomes combined with next-generation sequencing permits the validation of genome assemblies at the chromosome level. We demonstrate this approach by the assessment of the recently published chickpea kabuli and desi genomes. While previous genetic analysis suggests that these genomes should be very similar, a comparison of their chromosome sizes and published assemblies highlights significant differences. Our chromosomal genomics analysis highlights short defined regions that appear to have been misassembled in the kabuli genome and identifies large-scale misassembly in the draft desi genome. The integration of chromosomal genomics tools within genome sequencing projects has the potential to significantly improve the construction and validation of genome assemblies. The approach could be applied both for new genome assemblies as well as published assemblies, and complements currently applied genome assembly strategies.


Assuntos
Cromossomos de Plantas/genética , Cicer/genética , Genoma de Planta/genética , Genômica/métodos , Núcleo Celular/genética , DNA de Plantas/genética , Citometria de Fluxo , Fluorescência , Tamanho do Genoma , Reprodutibilidade dos Testes , Análise de Sequência de DNA
15.
G3 (Bethesda) ; 12(4)2022 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-35143647

RESUMO

Shrimp are a valuable aquaculture species globally; however, disease remains a major hindrance to shrimp aquaculture sustainability and growth. Mechanisms mediated by endogenous viral elements have been proposed as a means by which shrimp that encounter a new virus start to accommodate rather than succumb to infection over time. However, evidence on the nature of such endogenous viral elements and how they mediate viral accommodation is limited. More extensive genomic data on Penaeid shrimp from different geographical locations should assist in exposing the diversity of endogenous viral elements. In this context, reported here is a PacBio Sequel-based draft genome assembly of an Australian black tiger shrimp (Penaeus monodon) inbred for 1 generation. The 1.89 Gbp draft genome is comprised of 31,922 scaffolds (N50: 496,398 bp) covering 85.9% of the projected genome size. The genome repeat content (61.8% with 30% representing simple sequence repeats) is almost the highest identified for any species. The functional annotation identified 35,517 gene models, of which 25,809 were protein-coding and 17,158 were annotated using interproscan. Scaffold scanning for specific endogenous viral elements identified an element comprised of a 9,045-bp stretch of repeated, inverted, and jumbled genome fragments of infectious hypodermal and hematopoietic necrosis virus bounded by a repeated 591/590 bp host sequence. As only near complete linear ∼4 kb infectious hypodermal and hematopoietic necrosis virus genomes have been found integrated in the genome of P. monodon previously, its discovery has implications regarding the validity of PCR tests designed to specifically detect such linear endogenous viral element types. The existence of joined inverted infectious hypodermal and hematopoietic necrosis virus genome fragments also provides a means by which hairpin double-stranded RNA could be expressed and processed by the shrimp RNA interference machinery.


Assuntos
Densovirinae , Penaeidae , Animais , Austrália , Densovirinae/genética , Genoma Viral , Penaeidae/genética , Reação em Cadeia da Polimerase
16.
BMC Bioinformatics ; 9: 215, 2008 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-18442374

RESUMO

BACKGROUND: In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. RESULTS: The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the >/= 10 reads datasets and comparable in the > or = 8 kb benchmark tests. CONCLUSION: In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Reconhecimento Automatizado de Padrão/métodos , Filogenia , RNA Arqueal/classificação , RNA Bacteriano/classificação , Algoritmos , Inteligência Artificial , Sequência de Bases , Intervalos de Confiança , Bases de Dados Genéticas , Genes de RNAr , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , RNA Arqueal/análise , RNA Bacteriano/análise , Tamanho da Amostra , Análise de Sequência de RNA , Especificidade da Espécie , Incerteza
17.
J Biomed Biotechnol ; 2008: 513701, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18288261

RESUMO

Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%-15% speed improvement.


Assuntos
Algoritmos , Inteligência Artificial , Mapeamento Cromossômico/métodos , Interpretação Estatística de Dados , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Análise por Conglomerados
18.
Methods Mol Biol ; 1679: 277-291, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28913808

RESUMO

The genomics revolution brought on by advances in high-throughput sequencing has led to the production of vast amounts of data. Databases play an essential role in storing and managing this information to make it available to researchers and crop breeders. This chapter provides an outline of how to use databases and tools for wheat genome research.


Assuntos
Bases de Dados Genéticas , Genoma de Planta , Genômica , Triticum/genética , Biologia Computacional/métodos , Genômica/métodos , Melhoramento Vegetal , Interface Usuário-Computador , Navegador
19.
Methods Mol Biol ; 1374: 339-61, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519415

RESUMO

The recent advances in high throughput RNA sequencing (RNA-Seq) have generated huge amounts of data in a very short span of time for a single sample. These data have required the parallel advancement of computing tools to organize and interpret them meaningfully in terms of biological implications, at the same time using minimum computing resources to reduce computation costs. Here we describe the method of analyzing RNA-seq data using the set of open source software programs of the Tuxedo suite: TopHat and Cufflinks. TopHat is designed to align RNA-seq reads to a reference genome, while Cufflinks assembles these mapped reads into possible transcripts and then generates a final transcriptome assembly. Cufflinks also includes Cuffdiff, which accepts the reads assembled from two or more biological conditions and analyzes their differential expression of genes and transcripts, thus aiding in the investigation of their transcriptional and post transcriptional regulation under different conditions. We also describe the use of an accessory tool called CummeRbund, which processes the output files of Cuffdiff and gives an output of publication quality plots and figures of the user's choice. We demonstrate the effectiveness of the Tuxedo suite by analyzing RNA-Seq datasets of Arabidopsis thaliana root subjected to two different conditions.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Análise de Sequência de RNA/métodos , Software , Perfilação da Expressão Gênica/métodos , Transcriptoma
20.
Plant Methods ; 12: 2, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26793268

RESUMO

BACKGROUND: There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. RESULTS: We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. CONCLUSIONS: We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA