Pesquisa | Biblioteca Virtual em Saúde

1.

An 'eFP-Seq Browser' for visualizing and exploring RNA sequencing data.

Sullivan, Alexander; Purohit, Priyank K; Freese, Nowlan H; Pasha, Asher; Esteban, Eddi; Waese, Jamie; Wu, Alison; Chen, Michelle; Chin, Chih Y; Song, Richard; Watharkar, Sneha R; Chan, Agnes P; Krishnakumar, Vivek; Vaughn, Matthew W; Town, Chris; Loraine, Ann E; Provart, Nicholas J.

Plant J ; 100(3): 641-654, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31350781

RESUMO

Improvements in next-generation sequencing technologies have resulted in dramatically reduced sequencing costs. This has led to an explosion of '-seq'-based methods, of which RNA sequencing (RNA-seq) for generating transcriptomic data is the most popular. By analysing global patterns of gene expression in organs/tissues/cells of interest or in response to chemical or environmental perturbations, researchers can better understand an organism's biology. Tools designed to work with large RNA-seq data sets enable analyses and visualizations to help generate hypotheses about a gene's function. We present here a user-friendly RNA-seq data exploration tool, called the 'eFP-Seq Browser', that shows the read map coverage of a gene of interest in each of the samples along with 'electronic fluorescent pictographic' (eFP) images that serve as visual representations of expression levels. The tool also summarizes the details of each RNA-seq experiment, providing links to archival databases and publications. It automatically computes the reads per kilobase per million reads mapped expression-level summaries and point biserial correlation scores to sort the samples based on a gene's expression level or by how dissimilar the read map profile is from a gene splice variant, to quickly identify samples with the strongest expression level or where alternative splicing might be occurring. Links to the Integrated Genome Browser desktop visualization tool allow researchers to visualize and explore the details of RNA-seq alignments summarized in eFP-Seq Browser as coverage graphs. We present four cases of use of the eFP-Seq Browser for ABI3, SR34, SR45a and U2AF65B, where we examine expression levels and identify alternative splicing. The URL for the browser is https://bar.utoronto.ca/eFP-Seq_Browser/. OPEN RESEARCH BADGES: This article has earned an Open Data Badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. Tool is at https://bar.utoronto.ca/eFP-Seq_Browser/; RNA-seq data at https://s3.amazonaws.com/iplant-cdn/iplant/home/araport/rnaseq_bam/ and https://s3.amazonaws.com/iplant-cdn/iplant/home/araport/rnaseq_bam/Klepikova/. Code is available at https://github.com/BioAnalyticResource/eFP-Seq-Browser.

Assuntos

Arabidopsis/genética , Visualização de Dados , Genoma de Planta/genética , Transcriptoma , Navegador , Processamento Alternativo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/fisiologia , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , RNA de Plantas/genética , Alinhamento de Sequência , Análise de Sequência de RNA , Estresse Fisiológico , Temperatura

2.

Hidden genomic evolution in a morphospecies-The landscape of rapidly evolving genes in Tetrahymena.

Xiong, Jie; Yang, Wentao; Chen, Kai; Jiang, Chuanqi; Ma, Yang; Chai, Xiaocui; Yan, Guanxiong; Wang, Guangying; Yuan, Dongxia; Liu, Yifan; Bidwell, Shelby L; Zafar, Nikhat; Hadjithomas, Michalis; Krishnakumar, Vivek; Coyne, Robert S; Orias, Eduardo; Miao, Wei.

PLoS Biol ; 17(6): e3000294, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31158217

RESUMO

A morphospecies is defined as a taxonomic species based wholly on morphology, but often morphospecies consist of clusters of cryptic species that can be identified genetically or molecularly. The nature of the evolutionary novelty that accompanies speciation in a morphospecies is an intriguing question. Morphospecies are particularly common among ciliates, a group of unicellular eukaryotes that separates 2 kinds of nuclei-the silenced germline nucleus (micronucleus [MIC]) and the actively expressed somatic nucleus (macronucleus [MAC])-within a common cytoplasm. Because of their very similar morphologies, members of the Tetrahymena genus are considered a morphospecies. We explored the hidden genomic evolution within this genus by performing a comprehensive comparative analysis of the somatic genomes of 10 species and the germline genomes of 2 species of Tetrahymena. These species show high genetic divergence; phylogenomic analysis suggests that the genus originated about 300 million years ago (Mya). Seven universal protein domains are preferentially included among the species-specific (i.e., the youngest) Tetrahymena genes. In particular, leucine-rich repeat (LRR) genes make the largest contribution to the high level of genome divergence of the 10 species. LRR genes can be sorted into 3 different age groups. Parallel evolutionary trajectories have independently occurred among LRR genes in the different Tetrahymena species. Thousands of young LRR genes contain tandem arrays of exactly 90-bp exons. The introns separating these exons show a unique, extreme phase 2 bias, suggesting a clonal origin and successive expansions of 90-bp-exon LRR genes. Identifying LRR gene age groups allowed us to document a Tetrahymena intron length cycle. The youngest 90-bp exon LRR genes in T. thermophila are concentrated in pericentromeric and subtelomeric regions of the 5 micronuclear chromosomes, suggesting that these regions act as genome innovation centers. Copies of a Tetrahymena Long interspersed element (LINE)-like retrotransposon are very frequently found physically adjacent to 90-bp exon/intron repeat units of the youngest LRR genes. We propose that Tetrahymena species have used a massive exon-shuffling mechanism, involving unequal crossing over possibly in concert with retrotransposition, to create the unique 90-bp exon array LRR genes.

Assuntos

Genômica/métodos , Especificidade da Espécie , Tetrahymena/genética , Evolução Biológica , Evolução Molecular , Éxons , Genoma de Protozoário , Íntrons , Proteínas de Repetições Ricas em Leucina , Filogenia , Proteínas/genética , Tetrahymena/metabolismo

3.

ePlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.

Waese, Jamie; Fan, Jim; Pasha, Asher; Yu, Hans; Fucile, Geoffrey; Shi, Ruian; Cumming, Matthew; Kelley, Lawrence A; Sternberg, Michael J; Krishnakumar, Vivek; Ferlanti, Erik; Miller, Jason; Town, Chris; Stuerzlinger, Wolfgang; Provart, Nicholas J.

Plant Cell ; 29(8): 1806-1821, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-28808136

RESUMO

A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an "app" on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.

Assuntos

Botânica , Software , Estatística como Assunto , Biologia de Sistemas , Sequência de Bases , Cromossomos de Plantas/genética , Regulação da Expressão Gênica de Plantas , Frações Subcelulares/metabolismo , Interface Usuário-Computador

4.

ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery.

Krishnakumar, Vivek; Contrino, Sergio; Cheng, Chia-Yi; Belyaeva, Irina; Ferlanti, Erik S; Miller, Jason R; Vaughn, Matthew W; Micklem, Gos; Town, Christopher D; Chan, Agnes P.

Plant Cell Physiol ; 58(1): e4, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28013278

RESUMO

ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.

Assuntos

Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Proteínas de Arabidopsis/metabolismo , Biologia Computacional/métodos , Ontologia Genética , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA

5.

Araport11: a complete reannotation of the Arabidopsis thaliana reference genome.

Cheng, Chia-Yi; Krishnakumar, Vivek; Chan, Agnes P; Thibaud-Nissen, Françoise; Schobel, Seth; Town, Christopher D.

Plant J ; 89(4): 789-804, 2017 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-27862469

RESUMO

The flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, including mRNA, the various classes of non-coding RNA, and small RNA. The TAIR10 annotation update had a profound impact on Arabidopsis research but was released more than 5 years ago. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue-specific RNA-Seq libraries from 113 datasets and constructed 48 359 transcript models of protein-coding genes in eleven tissues. In addition, we annotated various classes of non-coding RNA including microRNA, long intergenic RNA, small nucleolar RNA, natural antisense transcript, small nuclear RNA, and small RNA using published datasets and in-house analytic results. Altogether, we identified 635 novel protein-coding genes, 508 novel transcribed regions, 5178 non-coding RNAs, and 35 846 small RNA loci that were formerly unannotated. Analysis of the splicing events and RNA-Seq based expression profiles revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.

Assuntos

Proteínas de Arabidopsis/genética , Arabidopsis/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , RNA de Plantas/genética , Transcriptoma/genética

6.

Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S.

Elife ; 52016 11 28.

Artigo em Inglês | MEDLINE | ID: mdl-27892853

RESUMO

The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

Assuntos

Rearranjo Gênico , Genoma de Protozoário , Tetrahymena thermophila/genética , Análise de Sequência de DNA

7.

MarkerMiner 1.0: A new application for phylogenetic marker development using angiosperm transcriptomes.

Chamala, Srikar; García, Nicolás; Godden, Grant T; Krishnakumar, Vivek; Jordon-Thaden, Ingrid E; De Smet, Riet; Barbazuk, W Brad; Soltis, Douglas E; Soltis, Pamela S.

Appl Plant Sci ; 3(4)2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25909041

RESUMO

PREMISE OF THE STUDY: Targeted sequencing using next-generation sequencing (NGS) platforms offers enormous potential for plant systematics by enabling economical acquisition of multilocus data sets that can resolve difficult phylogenetic problems. However, because discovery of single-copy nuclear (SCN) loci from NGS data requires both bioinformatics skills and access to high-performance computing resources, the application of NGS data has been limited. METHODS AND RESULTS: We developed MarkerMiner 1.0, a fully automated, open-access bioinformatic workflow and application for discovery of SCN loci in angiosperms. Our new tool identified as many as 1993 SCN loci from transcriptomic data sampled as part of four independent test cases representing marker development projects at different phylogenetic scales. CONCLUSIONS: MarkerMiner is an easy-to-use and effective tool for discovery of putative SCN loci. It can be run locally or via the Web, and its tabular and alignment outputs facilitate efficient downstream assessments of phylogenetic utility, locus selection, intron-exon boundary prediction, and primer or probe development.

8.

Polyribosomal RNA-Seq reveals the decreased complexity and diversity of the Arabidopsis translatome.

Zhang, Xingtan; Rosen, Benjamin D; Tang, Haibao; Krishnakumar, Vivek; Town, Christopher D.

PLoS One ; 10(2): e0117699, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25706651

RESUMO

Recent RNA-seq studies reveal that the transcriptomes in animals and plants are more complex than previously thought, leading to the inclusion of many more splice isoforms in annotated genomes. However, it is possible that a significant proportion of the transcripts are spurious isoforms that do not contribute to functional proteins. One of the current hypotheses is that commonly used mRNA extraction methods isolate both pre-mature (nuclear) mRNA and mature (cytoplasmic) mRNA, and these incompletely spliced pre-mature mRNAs may contribute to a large proportion of these spurious transcripts. To investigate this, we compared a traditional RNA-seq dataset (total RNA-seq) and a ribosome-bound RNA-seq dataset (polyribosomal RNA-seq) from Arabidopsis thaliana. An integrative framework that combined de novo assembly and genome-guided assembly was applied to reconstruct transcriptomes for the two datasets. Up to 44.8% of the de novo assembled transcripts in total RNA-seq sample were of low abundance, whereas only 0.09% in polyribosomal RNA-seq de novo assembly were of low abundance. The final round of assembly using PASA (Program to Assemble Spliced Alignments) resulted in more transcript assemblies in the total RNA-seq than those in polyribosomal sample. Comparison of alternative splicing (AS) patterns between total and polyribosomal RNA-seq showed a significant difference (G-test, p-value<0.01) in intron retention events: 46.4% of AS events in the total sample were intron retention, whereas only 23.5% showed evidence of intron retention in the polyribosomal sample. It is likely that a large proportion of retained introns in total RNA-seq result from incompletely spliced pre-mature mRNA. Overall, this study demonstrated that polyribosomal RNA-seq technology decreased the complexity and diversity of the coding transcriptome by eliminating pre-mature mRNAs, especially those of low abundance.

Assuntos

Arabidopsis/genética , Variação Genética/genética , Polirribossomos/genética , RNA Mensageiro/genética , RNA de Plantas/genética , Transcriptoma/genética , Processamento Alternativo/genética , Proteínas de Arabidopsis/genética , Íntrons/genética , Análise de Sequência de RNA/métodos

9.

Araport: the Arabidopsis information portal.

Krishnakumar, Vivek; Hanlon, Matthew R; Contrino, Sergio; Ferlanti, Erik S; Karamycheva, Svetlana; Kim, Maria; Rosen, Benjamin D; Cheng, Chia-Yi; Moreira, Walter; Mock, Stephen A; Stubbs, Joseph; Sullivan, Julie M; Krampis, Konstantinos; Miller, Jason R; Micklem, Gos; Vaughn, Matthew; Town, Christopher D.

Nucleic Acids Res ; 43(Database issue): D1003-9, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25414324

RESUMO

The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.

Assuntos

Arabidopsis/genética , Bases de Dados Genéticas , Genoma de Planta , Mineração de Dados , Internet , Software

10.

MTGD: The Medicago truncatula genome database.

Krishnakumar, Vivek; Kim, Maria; Rosen, Benjamin D; Karamycheva, Svetlana; Bidwell, Shelby L; Tang, Haibao; Town, Christopher D.

Plant Cell Physiol ; 56(1): e1, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25432968

RESUMO

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Genoma de Planta/genética , Medicago truncatula/genética , Interface Usuário-Computador , Armazenamento e Recuperação da Informação , Internet

11.

A maize database resource that captures tissue-specific and subcellular-localized gene expression, via fluorescent tags and confocal imaging (Maize Cell Genomics Database).

Krishnakumar, Vivek; Choi, Yongwook; Beck, Erin; Wu, Qingyu; Luo, Anding; Sylvester, Anne; Jackson, David; Chan, Agnes P.

Plant Cell Physiol ; 56(1): e12, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25432973

RESUMO

Maize is a global crop and a powerful system among grain crops for genetic and genomic studies. However, the development of novel biological tools and resources to aid in the functional identification of gene sequences is greatly needed. Towards this goal, we have developed a collection of maize marker lines for studying native gene expression in specific cell types and subcellular compartments using fluorescent proteins (FPs). To catalog FP expression, we have developed a public repository, the Maize Cell Genomics (MCG) Database, (http://maize.jcvi.org/cellgenomics), to organize a large data set of confocal images generated from the maize marker lines. To date, the collection represents major subcellular structures and also developmentally important progenitor cell populations. The resource is available to the research community, for example to study protein localization or interactions under various experimental conditions or mutant backgrounds. A subset of the marker lines can also be used to induce misexpression of target genes through a transactivation system. For future directions, the image repository can be expanded to accept new image submissions from the research community, and to perform customized large-scale computational image analysis. This community resource will provide a suite of new tools for gaining biological insights by following the dynamics of protein expression at the subcellular, cellular and tissue levels.

Assuntos

Bases de Dados Factuais , Genoma de Planta/genética , Genômica , Proteômica , Zea mays/metabolismo , Biomarcadores/metabolismo , Expressão Gênica , Proteínas Luminescentes , Especificidade de Órgãos , Transporte Proteico , Zea mays/citologia , Zea mays/genética

12.

Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea.

Parkin, Isobel A P; Koh, Chushin; Tang, Haibao; Robinson, Stephen J; Kagale, Sateesh; Clarke, Wayne E; Town, Chris D; Nixon, John; Krishnakumar, Vivek; Bidwell, Shelby L; Denoeud, France; Belcram, Harry; Links, Matthew G; Just, Jérémy; Clarke, Carling; Bender, Tricia; Huebert, Terry; Mason, Annaliese S; Pires, J Chris; Barker, Guy; Moore, Jonathan; Walley, Peter G; Manoli, Sahana; Batley, Jacqueline; Edwards, David; Nelson, Matthew N; Wang, Xiyin; Paterson, Andrew H; King, Graham; Bancroft, Ian; Chalhoub, Boulos; Sharpe, Andrew G.

Genome Biol ; 15(6): R77, 2014 Jun 10.

Artigo em Inglês | MEDLINE | ID: mdl-24916971

RESUMO

BACKGROUND: Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. RESULTS: We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. CONCLUSIONS: Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.

Assuntos

Brassica/genética , Genoma de Planta , Transcriptoma , Aneuploidia , Brassica/metabolismo , Mapeamento Cromossômico , Metilação de DNA , Epigênese Genética , Evolução Molecular , Regulação da Expressão Gênica de Plantas , Anotação de Sequência Molecular , Dados de Sequência Molecular , Análise de Sequência de DNA

13.

An improved genome release (version Mt4.0) for the model legume Medicago truncatula.

Tang, Haibao; Krishnakumar, Vivek; Bidwell, Shelby; Rosen, Benjamin; Chan, Agnes; Zhou, Shiguo; Gentzbittel, Laurent; Childs, Kevin L; Yandell, Mark; Gundlach, Heidrun; Mayer, Klaus F X; Schwartz, David C; Town, Christopher D.

BMC Genomics ; 15: 312, 2014 Apr 27.

Artigo em Inglês | MEDLINE | ID: mdl-24767513

RESUMO

BACKGROUND: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011. RESULTS: Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an "unsupported" status and 4% are absent from the Mt4.0 predictions. CONCLUSIONS: Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.

Assuntos

Genoma de Planta , Medicago truncatula/genética , Cromossomos Artificiais Bacterianos

14.

The Medicago genome provides insight into the evolution of rhizobial symbioses.

Young, Nevin D; Debellé, Frédéric; Oldroyd, Giles E D; Geurts, Rene; Cannon, Steven B; Udvardi, Michael K; Benedito, Vagner A; Mayer, Klaus F X; Gouzy, Jérôme; Schoof, Heiko; Van de Peer, Yves; Proost, Sebastian; Cook, Douglas R; Meyers, Blake C; Spannagl, Manuel; Cheung, Foo; De Mita, Stéphane; Krishnakumar, Vivek; Gundlach, Heidrun; Zhou, Shiguo; Mudge, Joann; Bharti, Arvind K; Murray, Jeremy D; Naoumkina, Marina A; Rosen, Benjamin; Silverstein, Kevin A T; Tang, Haibao; Rombauts, Stephane; Zhao, Patrick X; Zhou, Peng; Barbe, Valérie; Bardou, Philippe; Bechner, Michael; Bellec, Arnaud; Berger, Anne; Bergès, Hélène; Bidwell, Shelby; Bisseling, Ton; Choisne, Nathalie; Couloux, Arnaud; Denny, Roxanne; Deshpande, Shweta; Dai, Xinbin; Doyle, Jeff J; Dudez, Anne-Marie; Farmer, Andrew D; Fouteau, Stéphanie; Franken, Carolien; Gibelin, Chrystel; Gish, John.

Nature ; 480(7378): 520-4, 2011 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-22089132

RESUMO

Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species. Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing â¼94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.

Assuntos

Evolução Biológica , Genoma de Planta , Medicago truncatula/genética , Medicago truncatula/microbiologia , Rhizobium/fisiologia , Simbiose , Dados de Sequência Molecular , Fixação de Nitrogênio/genética , Glycine max/genética , Sintenia , Vitis/genética

15.

A web-based software system for dynamic gene cluster comparison across multiple genomes.

Revanna, Kashi Vishwanath; Krishnakumar, Vivek; Dong, Qunfeng.

Bioinformatics ; 25(7): 956-7, 2009 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-19208612

RESUMO

SUMMARY: Investigating the conservation of gene clusters across multiple genomes has become a standard practice in the era of comparative genomics. However, all existing software and databases rely heavily on pre-computation to identify homologous genes by genome-wide comparisons. Such pre-computing strategies lack accuracy and updating the data is computationally intensive. Since most molecular biologists are often interested only in a small cluster of genes, catering to this need, we have developed a web-based software system that allows users to upload a list of genes, perform dynamic search against the genomes of their choices and interactively visualize the gene cluster conservation using a novel multi-genome browser. Our approach avoids expensive genome-wide pre-computing and allows users to dynamically change the search criteria to fit their genes of interest. Our system can be customized for any genome sequences. We have applied it to both prokaryotic and eukaryotic genomes to illustrate its usability. AVAILABILITY: Our software is freely available at http://cgcv.cgb.indiana.edu/cgi-bin/index.cgi.

Assuntos

Genoma/genética , Família Multigênica/genética , Software , Internet

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA