Pesquisa | Portal Regional da BVS

Breedbase: a digital ecosystem for modern plant breeding.

Morales, Nicolas; Ogbonna, Alex C; Ellerbrock, Bryan J; Bauchet, Guillaume J; Tantikanjana, Titima; Tecle, Isaak Y; Powell, Adrian F; Lyon, David; Menda, Naama; Simoes, Christiano C; Saha, Surya; Hosmani, Prashant; Flores, Mirella; Panitz, Naftali; Preble, Ryan S; Agbona, Afolabi; Rabbi, Ismail; Kulakow, Peter; Peteti, Prasad; Kawuki, Robert; Esuma, Williams; Kanaabi, Micheal; Chelangat, Doreen M; Uba, Ezenwanyi; Olojede, Adeyemi; Onyeka, Joseph; Shah, Trushar; Karanja, Margaret; Egesi, Chiedozie; Tufan, Hale; Paterne, Agre; Asfaw, Asrat; Jannink, Jean-Luc; Wolfe, Marnin; Birkett, Clay L; Waring, David J; Hershberger, Jenna M; Gore, Michael A; Robbins, Kelly R; Rife, Trevor; Courtney, Chaney; Poland, Jesse; Arnaud, Elizabeth; Laporte, Marie-Angélique; Kulembeka, Heneriko; Salum, Kasele; Mrema, Emmanuel; Brown, Allan; Bayo, Stanley; Uwimana, Brigitte.

G3 (Bethesda) ; 12(7)2022 07 06.

Artigo em Inglês | MEDLINE | ID: mdl-35385099

RESUMO

Modern breeding methods integrate next-generation sequencing and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to (1) track breeding materials, (2) store experimental evaluations, (3) record phenotypic measurements using consistent ontologies, (4) store genotypic information, and (5) implement algorithms for analysis, prediction, and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/, last accessed 4/18/2022). Originally initiated as Cassavabase (https://cassavabase.org/, last accessed 4/18/2022) with the NextGen Cassava project (https://www.nextgencassava.org/, last accessed 4/18/2022), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/, last accessed 4/18/2022) and packaged in a Docker image for deployment (https://hub.docker.com/u/breedbase, last accessed 4/18/2022). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem.

Assuntos

Ecossistema , Melhoramento Vegetal , Algoritmos , Produtos Agrícolas/genética , Software

High density genotype storage for plant breeding in the Chado schema of Breedbase.

Morales, Nicolas; Bauchet, Guillaume J; Tantikanjana, Titima; Powell, Adrian F; Ellerbrock, Bryan J; Tecle, Isaak Y; Mueller, Lukas A.

PLoS One ; 15(11): e0240059, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33175872

RESUMO

Modern breeding programs routinely use genome-wide information for selecting individuals to advance. The large volumes of genotypic information required present a challenge for data storage and query efficiency. Major use cases require genotyping data to be linked with trait phenotyping data. In contrast to phenotyping data that are often stored in relational database schemas, next-generation genotyping data are traditionally stored in non-relational storage systems due to their extremely large scope. This study presents a novel data model implemented in Breedbase (https://breedbase.org/) for uniting relational phenotyping data and non-relational genotyping data within the open-source PostgreSQL database engine. Breedbase is an open-source, web-database designed to manage all of a breeder's informatics needs: management of field experiments, phenotypic and genotypic data collection and storage, and statistical analyses. The genotyping data is stored in a PostgreSQL data-type known as binary JavaScript Object Notation (JSONb), where the JSON structures closely follow the Variant Call Format (VCF) data model. The Breedbase genotyping data model can handle different ploidy levels, structural variants, and any genotype encoded in VCF. JSONb is both compressed and indexed, resulting in a space and time efficient system. Furthermore, file caching maximizes data retrieval performance. Integration of all breeding data within the Chado database schema retains referential integrity that may be lost when genotyping and phenotyping data are stored in separate systems. Benchmarking demonstrates that the system is fast enough for computation of a genomic relationship matrix (GRM) and genome wide association study (GWAS) for datasets involving 1,325 diploid Zea mays, 314 triploid Musa acuminata, and 924 diploid Manihot esculenta samples genotyped with 955,690, 142,119, and 287,952 genotype-by-sequencing (GBS) markers, respectively.

Assuntos

Bases de Dados Genéticas , Manihot/genética , Musa/genética , Zea mays/genética , Análise de Dados , Genótipo , Melhoramento Vegetal , Plantas

The Sol Genomics Network (SGN)--from genotype to phenotype to breeding.

Fernandez-Pozo, Noe; Menda, Naama; Edwards, Jeremy D; Saha, Surya; Tecle, Isaak Y; Strickler, Susan R; Bombarely, Aureliano; Fisher-York, Thomas; Pujar, Anuradha; Foerster, Hartmut; Yan, Aimin; Mueller, Lukas A.

Nucleic Acids Res ; 43(Database issue): D1036-41, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25428362

RESUMO

The Sol Genomics Network (SGN, http://solgenomics.net) is a web portal with genomic and phenotypic data, and analysis tools for the Solanaceae family and close relatives. SGN hosts whole genome data for an increasing number of Solanaceae family members including tomato, potato, pepper, eggplant, tobacco and Nicotiana benthamiana. The database also stores loci and phenotype data, which researchers can upload and edit with user-friendly web interfaces. Tools such as BLAST, GBrowse and JBrowse for browsing genomes, expression and map data viewers, a locus community annotation system and a QTL analysis tools are available. A new tool was recently implemented to improve Virus-Induced Gene Silencing (VIGS) constructs called the SGN VIGS tool. With the growing genomic and phenotypic data in the database, SGN is now advancing to develop new web-based breeding tools and implement the code and database structure for other species or clade-specific databases.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma de Planta , Solanaceae/genética , Cruzamento , Cruzamentos Genéticos , Genômica , Genótipo , Internet , Fenótipo , Solanaceae/metabolismo

solGS: a web-based tool for genomic selection.

Tecle, Isaak Y; Edwards, Jeremy D; Menda, Naama; Egesi, Chiedozie; Rabbi, Ismail Y; Kulakow, Peter; Kawuki, Robert; Jannink, Jean-Luc; Mueller, Lukas A.

BMC Bioinformatics ; 15: 398, 2014 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-25495537

RESUMO

BACKGROUND: Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, analysis, and sharing. A bioinformatics infrastructure for data storage and access, and user-friendly web-based tool for analysis and sharing output is needed to make GS more practical for breeders. RESULTS: We have developed a web-based tool, called solGS, for predicting genomic estimated breeding values (GEBVs) of individuals, using a Ridge-Regression Best Linear Unbiased Predictor (RR-BLUP) model. It has an intuitive web-interface for selecting a training population for modeling and estimating genomic estimated breeding values of selection candidates. It estimates phenotypic correlation and heritability of traits and selection indices of individuals. Raw data is stored in a generic database schema, Chado Natural Diversity, co-developed by multiple database groups. Analysis output is graphically visualized and can be interactively explored online or downloaded in text format. An instance of its implementation can be accessed at the NEXTGEN Cassava breeding database, http://cassavabase.org/solgs. CONCLUSIONS: solGS enables breeders to store raw data and estimate GEBVs of individuals online, in an intuitive and interactive workflow. It can be adapted to any breeding program.

Assuntos

Cruzamento , Manihot/genética , Software , Genômica , Internet , Manihot/fisiologia , Locos de Características Quantitativas

The Sol Genomics Network (solgenomics.net): growing tomatoes using Perl.

Bombarely, Aureliano; Menda, Naama; Tecle, Isaak Y; Buels, Robert M; Strickler, Susan; Fischer-York, Thomas; Pujar, Anuradha; Leto, Jonathan; Gosselin, Joseph; Mueller, Lukas A.

Nucleic Acids Res ; 39(Database issue): D1149-55, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-20935049

RESUMO

The Sol Genomics Network (SGN; http://solgenomics.net/) is a clade-oriented database (COD) containing biological data for species in the Solanaceae and their close relatives, with data types ranging from chromosomes and genes to phenotypes and accessions. SGN hosts several genome maps and sequences, including a pre-release of the tomato (Solanum lycopersicum cv Heinz 1706) reference genome. A new transcriptome component has been added to store RNA-seq and microarray data. SGN is also an open source software project, continuously developing and improving a complex system for storing, integrating and analyzing data. All code and development work is publicly visible on GitHub (http://github.com). The database architecture combines SGN-specific schemas and the community-developed Chado schema (http://gmod.org/wiki/Chado) for compatibility with other genome databases. The SGN curation model is community-driven, allowing researchers to add and edit information using simple web tools. Currently, over a hundred community annotators help curate the database. SGN can be accessed at http://solgenomics.net/.

Assuntos

Bases de Dados Genéticas , Genoma de Planta , Solanum lycopersicum/genética , Perfilação da Expressão Gênica , Genômica , Solanum lycopersicum/crescimento & desenvolvimento , Solanum lycopersicum/metabolismo , Proteínas de Plantas/genética , Software

solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database.

Tecle, Isaak Y; Menda, Naama; Buels, Robert M; van der Knaap, Esther; Mueller, Lukas A.

BMC Bioinformatics ; 11: 525, 2010 Oct 21.

Artigo em Inglês | MEDLINE | ID: mdl-20964836

RESUMO

BACKGROUND: A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort. Furthermore, as entire genomes are being sequenced and an increasing amount of genetic and expression data are being generated, a challenge remains: linking phenotypic variation to the underlying genomic variation. To identify candidate genes and understand the molecular basis underlying the phenotypic variation of traits, bioinformatic approaches are needed to exploit information such as genetic map, expression and whole genome sequence data of organisms in biological databases. DESCRIPTION: The Sol Genomics Network (SGN, http://solgenomics.net) is a primary repository for phenotypic, genetic, genomic, expression and metabolic data for the Solanaceae family and other related Asterids species and houses a variety of bioinformatics tools. SGN has implemented a new approach to QTL data organization, storage, analysis, and cross-links with other relevant data in internal and external databases. The new QTL module, solQTL, http://solgenomics.net/qtl/, employs a user-friendly web interface for uploading raw phenotype and genotype data to the database, R/QTL mapping software for on-the-fly QTL analysis and algorithms for online visualization and cross-referencing of QTLs to relevant datasets and tools such as the SGN Comparative Map Viewer and Genome Browser. Here, we describe the development of the solQTL module and demonstrate its application. CONCLUSIONS: solQTL allows Solanaceae researchers to upload raw genotype and phenotype data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations. Exploration and synthesis of the relevant data is expected to help facilitate identification of candidate genes underlying phenotypic variation and markers more closely linked to QTLs. solQTL is freely available on SGN and can be used in private or public mode.

Assuntos

Genoma de Planta , Genômica/métodos , Locos de Características Quantitativas/genética , Software , Algoritmos , Bases de Dados Factuais , Bases de Dados Genéticas , Fenótipo , Solanaceae/genética

Gramene QTL database: development, content and applications.

Ni, Junjian; Pujar, Anuradha; Youens-Clark, Ken; Yap, Immanuel; Jaiswal, Pankaj; Tecle, Isaak; Tung, Chih-Wei; Ren, Liya; Spooner, William; Wei, Xuehong; Avraham, Shuly; Ware, Doreen; Stein, Lincoln; McCouch, Susan.

Database (Oxford) ; 2009: bap005, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-20157478

RESUMO

Gramene is a comparative information resource for plants that integrates data across diverse data domains. In this article, we describe the development of a quantitative trait loci (QTL) database and illustrate how it can be used to facilitate both the forward and reverse genetics research. The QTL database contains the largest online collection of rice QTL data in the world. Using flanking markers as anchors, QTLs originally reported on individual genetic maps have been systematically aligned to the rice sequence where they can be searched as standard genomic features. Researchers can determine whether a QTL co-localizes with other QTLs detected in independent experiments and can combine data from multiple studies to improve the resolution of a QTL position. Candidate genes falling within a QTL interval can be identified and their relationship to particular phenotypes can be inferred based on functional annotations provided by ontology terms. Mutations identified in functional genomics populations and association mapping panels can be aligned with QTL regions to facilitate fine mapping and validation of gene-phenotype associations. By assembling and integrating diverse types of data and information across species and levels of biological complexity, the QTL database enhances the potential to understand and utilize QTL information in biological research.

A community-based annotation framework for linking solanaceae genomes with phenomes.

Menda, Naama; Buels, Robert M; Tecle, Isaak; Mueller, Lukas A.

Plant Physiol ; 147(4): 1788-99, 2008 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-18539779

RESUMO

The amount of biological data available in the public domain is growing exponentially, and there is an increasing need for infrastructural and human resources to organize, store, and present the data in a proper context. Model organism databases (MODs) invest great efforts to functionally annotate genomes and phenomes by in-house curators. The SOL Genomics Network (SGN; http://www.sgn.cornell.edu) is a clade-oriented database (COD), which provides a more scalable and comparative framework for biological information. SGN has recently spearheaded a new approach by developing community annotation tools to expand its curational capacity. These tools effectively allow some curation to be delegated to qualified researchers, while, at the same time, preserving the in-house curators' full editorial control. Here we describe the background, features, implementation, results, and development road map of SGN's community annotation tools for curating genotypes and phenotypes. Since the inception of this project in late 2006, interest and participation from the Solanaceae research community has been strong and growing continuously to the extent that we plan to expand the framework to accommodate more plant taxa. All data, tools, and code developed at SGN are freely available to download and adapt.

Assuntos

Bases de Dados Genéticas , Genoma de Planta , Fenótipo , Solanaceae/genética , Interface Usuário-Computador

Gramene: a growing plant comparative genomics resource.

Liang, Chengzhi; Jaiswal, Pankaj; Hebbard, Claire; Avraham, Shuly; Buckler, Edward S; Casstevens, Terry; Hurwitz, Bonnie; McCouch, Susan; Ni, Junjian; Pujar, Anuradha; Ravenscroft, Dean; Ren, Liya; Spooner, William; Tecle, Isaak; Thomason, Jim; Tung, Chih-wei; Wei, Xuehong; Yap, Immanuel; Youens-Clark, Ken; Ware, Doreen; Stein, Lincoln.

Nucleic Acids Res ; 36(Database issue): D947-53, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17984077

RESUMO

Gramene (www.gramene.org) is a curated resource for genetic, genomic and comparative genomics data for the major crop species, including rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project. All data and software are freely downloadable through the ftp site (ftp.gramene.org/pub/gramene) and available for use without restriction. Gramene's core data types include genome assembly and annotations, other DNA/mRNA sequences, genetic and physical maps/markers, genes, quantitative trait loci (QTLs), proteins, ontologies, literature and comparative mappings. Since our last NAR publication 2 years ago, we have updated these data types to include new datasets and new connections among them. Completely new features include rice pathways for functional annotation of rice genes; genetic diversity data from rice, maize and wheat to show genetic variations among different germplasms; large-scale genome comparisons among Oryza sativa and its wild relatives for evolutionary studies; and the creation of orthologous gene sets and phylogenetic trees among rice, Arabidopsis thaliana, maize, poplar and several animal species (for reference purpose). We have significantly improved the web interface in order to provide a more user-friendly browsing experience, including a dropdown navigation menu system, unified web page for markers, genes, QTLs and proteins, and enhanced quick search functions.

Assuntos

Produtos Agrícolas/genética , Bases de Dados Genéticas , Genoma de Planta , Arabidopsis/genética , Mapeamento Cromossômico , Produtos Agrícolas/metabolismo , Marcadores Genéticos , Variação Genética , Genômica , Internet , Oryza/genética , Poaceae/genética , Triticum/genética , Interface Usuário-Computador , Zea mays/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA