Your browser doesn't support javascript.
loading
High density genotype storage for plant breeding in the Chado schema of Breedbase.
Morales, Nicolas; Bauchet, Guillaume J; Tantikanjana, Titima; Powell, Adrian F; Ellerbrock, Bryan J; Tecle, Isaak Y; Mueller, Lukas A.
Afiliação
  • Morales N; Plant Breeding and Genetics, Cornell University, Ithaca, NY, United States of America.
  • Bauchet GJ; Boyce Thompson Institute, Ithaca, NY, United States of America.
  • Tantikanjana T; Boyce Thompson Institute, Ithaca, NY, United States of America.
  • Powell AF; Boyce Thompson Institute, Ithaca, NY, United States of America.
  • Ellerbrock BJ; Boyce Thompson Institute, Ithaca, NY, United States of America.
  • Tecle IY; Boyce Thompson Institute, Ithaca, NY, United States of America.
  • Mueller LA; Boyce Thompson Institute, Ithaca, NY, United States of America.
PLoS One ; 15(11): e0240059, 2020.
Article em En | MEDLINE | ID: mdl-33175872
ABSTRACT
Modern breeding programs routinely use genome-wide information for selecting individuals to advance. The large volumes of genotypic information required present a challenge for data storage and query efficiency. Major use cases require genotyping data to be linked with trait phenotyping data. In contrast to phenotyping data that are often stored in relational database schemas, next-generation genotyping data are traditionally stored in non-relational storage systems due to their extremely large scope. This study presents a novel data model implemented in Breedbase (https//breedbase.org/) for uniting relational phenotyping data and non-relational genotyping data within the open-source PostgreSQL database engine. Breedbase is an open-source, web-database designed to manage all of a breeder's informatics needs management of field experiments, phenotypic and genotypic data collection and storage, and statistical analyses. The genotyping data is stored in a PostgreSQL data-type known as binary JavaScript Object Notation (JSONb), where the JSON structures closely follow the Variant Call Format (VCF) data model. The Breedbase genotyping data model can handle different ploidy levels, structural variants, and any genotype encoded in VCF. JSONb is both compressed and indexed, resulting in a space and time efficient system. Furthermore, file caching maximizes data retrieval performance. Integration of all breeding data within the Chado database schema retains referential integrity that may be lost when genotyping and phenotyping data are stored in separate systems. Benchmarking demonstrates that the system is fast enough for computation of a genomic relationship matrix (GRM) and genome wide association study (GWAS) for datasets involving 1,325 diploid Zea mays, 314 triploid Musa acuminata, and 924 diploid Manihot esculenta samples genotyped with 955,690, 142,119, and 287,952 genotype-by-sequencing (GBS) markers, respectively.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Manihot / Zea mays / Musa / Bases de Dados Genéticas Idioma: En Revista: PLoS One Assunto da revista: CIENCIA / MEDICINA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Manihot / Zea mays / Musa / Bases de Dados Genéticas Idioma: En Revista: PLoS One Assunto da revista: CIENCIA / MEDICINA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos