Pesquisa | BVS Doenças Infecciosas e Parasitárias

Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases.

Staton, Margaret; Cannon, Ethalinda; Sanderson, Lacey-Anne; Wegrzyn, Jill; Anderson, Tavis; Buehler, Sean; Cobo-Simón, Irene; Faaberg, Kay; Grau, Emily; Guignon, Valentin; Gunoskey, Jessica; Inderski, Blake; Jung, Sook; Lager, Kelly; Main, Dorrie; Poelchau, Monica; Ramnath, Risharde; Richter, Peter; West, Joe; Ficklin, Stephen.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34251419

RESUMO

Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.

Assuntos

Cruzamento , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Plantas/genética , Software , Produtos Agrícolas/genética , Variação Genética , Filogenia , Plantas/metabolismo , Proteômica , Navegador

Cyberinfrastructure and resources to enable an integrative approach to studying forest trees.

Wegrzyn, Jill L; Falk, Taylor; Grau, Emily; Buehler, Sean; Ramnath, Risharde; Herndon, Nic.

Evol Appl ; 13(1): 228-241, 2020 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-31892954

RESUMO

Sequencing technologies and bioinformatic approaches are now available to resolve the challenges associated with complex and heterozygous genomes. Increased access to less expensive and more effective instrumentation will contribute to a wealth of high-quality plant genomes in the next few years. In the meantime, more than 370 tree species are associated with public projects in primary repositories that are interrogating expression profiles, identifying variants, or analyzing targeted capture without a high-quality reference genome. Genomic data from these projects generates sequences that represent intermediate assemblies for transcriptomes and genomes. These data contribute to forest tree biology, but the associated sequence remains trapped in supplemental files that are poorly integrated in plant community databases and comparative genomic platforms. Successful implementation of life science cyberinfrastructure is improving data standards, ontologies, analytic workflows, and integrated database platforms for both model and non-model plant species. Unique to forest trees with large populations that are long-lived, outcrossing, and genetically diverse, the phenotypic and environmental metrics associated with georeferenced populations are just as important as the genomic data sampled for each individual. To address questions related to forest health and productivity, cyberinfrastructure must keep pace with the magnitude of genomic and phenomic sampling of larger populations. This review examines the current landscape of cyberinfrastructure, with an emphasis on best practices and resources to align community data with the Findable, Accessible, Interoperable, and Reusable (FAIR) guidelines.

Cyberinfrastructure to Improve Forest Health and Productivity: The Role of Tree Databases in Connecting Genomes, Phenomes, and the Environment.

Wegrzyn, Jill L; Staton, Margaret A; Street, Nathaniel R; Main, Dorrie; Grau, Emily; Herndon, Nic; Buehler, Sean; Falk, Taylor; Zaman, Sumaira; Ramnath, Risharde; Richter, Peter; Sun, Lang; Condon, Bradford; Almsaeed, Abdullah; Chen, Ming; Mannapperuma, Chanaka; Jung, Sook; Ficklin, Stephen.

Front Plant Sci ; 10: 813, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31293610

RESUMO

Despite tremendous advancements in high throughput sequencing, the vast majority of tree genomes, and in particular, forest trees, remain elusive. Although primary databases store genetic resources for just over 2,000 forest tree species, these are largely focused on sequence storage, basic genome assemblies, and functional assignment through existing pipelines. The tree databases reviewed here serve as secondary repositories for community data. They vary in their focal species, the data they curate, and the analytics provided, but they are united in moving toward a goal of centralizing both data access and analysis. They provide frameworks to view and update annotations for complex genomes, interrogate systems level expression profiles, curate data for comparative genomics, and perform real-time analysis with genotype and phenotype data. The organism databases of today are no longer simply catalogs or containers of genetic information. These repositories represent integrated cyberinfrastructure that support cross-site queries and analysis in web-based environments. These resources are striving to integrate across diverse experimental designs, sequence types, and related measures through ontologies, community standards, and web services. Efficient, simple, and robust platforms that enhance the data generated by the research community, contribute to improving forest health and productivity.

Growing and cultivating the forest genomics database, TreeGenes.

Falk, Taylor; Herndon, Nic; Grau, Emily; Buehler, Sean; Richter, Peter; Zaman, Sumaira; Baker, Eliza M; Ramnath, Risharde; Ficklin, Stephen; Staton, Margaret; Feltus, Frank A; Jung, Sook; Main, Doreen; Wegrzyn, Jill L.

Database (Oxford) ; 2018: 1-11, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30239664

RESUMO

Forest trees are valued sources of pulp, timber and biofuels, and serve a role in carbon sequestration, biodiversity maintenance and watershed stability. Examining the relationships among genetic, phenotypic and environmental factors for these species provides insight on the areas of concern for breeders and researchers alike. The TreeGenes database is a web-based repository that is home to 1790 tree species and over 1500 registered users. The database provides a curated archive for high-throughput genomics, including reference genomes, transcriptomes, genetic maps and variant data. These resources are paired with extensive phenotypic information and environmental layers. TreeGenes recently migrated to Tripal, an integrated and open-source database schema and content management system. This migration enabled developments focused on data exchange, data transfer and improved analytical capacity, as well as providing TreeGenes the opportunity to communicate with the following partner databases: Hardwood Genomics Web, Genome Database for Rosaceae, and the Citrus Genome Database. Recent development in TreeGenes has focused on coordinating information for georeferenced accessions, including metadata acquisition and ontological frameworks, to improve integration across studies combining genetic, phenotypic and environmental data. This focus was paired with the development of tools to enable comparative genomics and data visualization. By combining advanced data importers, relevant metadata standards and integrated analytical frameworks, TreeGenes provides a platform for researchers to store, submit and analyze forest tree data.

Assuntos

Bases de Dados Genéticas , Florestas , Genômica , Mineração de Dados , Ontologia Genética , Fenótipo , Filogenia , Ferramenta de Busca , Software , Árvores/genética , Árvores/crescimento & desenvolvimento

AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture.

Harper, Lisa; Campbell, Jacqueline; Cannon, Ethalinda K S; Jung, Sook; Poelchau, Monica; Walls, Ramona; Andorf, Carson; Arnaud, Elizabeth; Berardini, Tanya Z; Birkett, Clayton; Cannon, Steve; Carson, James; Condon, Bradford; Cooper, Laurel; Dunn, Nathan; Elsik, Christine G; Farmer, Andrew; Ficklin, Stephen P; Grant, David; Grau, Emily; Herndon, Nic; Hu, Zhi-Liang; Humann, Jodi; Jaiswal, Pankaj; Jonquet, Clement; Laporte, Marie-Angélique; Larmande, Pierre; Lazo, Gerard; McCarthy, Fiona; Menda, Naama; Mungall, Christopher J; Munoz-Torres, Monica C; Naithani, Sushma; Nelson, Rex; Nesdill, Daureen; Park, Carissa; Reecy, James; Reiser, Leonore; Sanderson, Lacey-Anne; Sen, Taner Z; Staton, Margaret; Subramaniam, Sabarinath; Tello-Ruiz, Marcela Karey; Unda, Victor; Unni, Deepak; Wang, Liya; Ware, Doreen; Wegrzyn, Jill; Williams, Jason; Woodhouse, Margaret.

Database (Oxford) ; 20182018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30239679

RESUMO

The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.

Assuntos

Agricultura , Bases de Dados Genéticas , Genômica , Cruzamento , Ontologia Genética , Metadados , Inquéritos e Questionários

Growing and cultivating the forest genomics database, TreeGenes.

Database (Oxford) ; 20192019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30865259

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA