Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
PLoS One ; 19(6): e0306187, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38905271

RESUMEN

[This corrects the article DOI: 10.1371/journal.pone.0297015.].

2.
PLoS One ; 19(3): e0297015, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38446822

RESUMEN

Gene expression is highly impacted by the environment and can be reflective of past events that affected developmental processes. It is therefore expected that gene expression can serve as a signal of a current or future phenotypic traits. In this paper we identify sets of genes, which we call Prognostic Transcriptomic Biomarkers (PTBs), that can predict firmness in Malus domestica (apple) fruits. In apples, all individuals of a cultivar are clones, and differences in fruit quality are due to the environment. The apples transcriptome responds to these differences in environment, which makes PTBs an attractive predictor of future fruit quality. PTBs have the potential to enhance supply chain efficiency, reduce crop loss, and provide higher and more consistent quality for consumers. However, several questions must be addressed. In this paper we answer the question of which of two common modeling approaches, Random Forest or ElasticNet, outperforms the other. We answer if PTBs with few genes are efficient at predicting traits. This is important because we need few genes to perform qPCR, and we answer the question if qPCR is a cost-effective assay as input for PTBs modeled using high-throughput RNA-seq. To do this, we conducted a pilot study using fruit texture in the 'Gala' variety of apples across several postharvest storage regiments. Fruit texture in 'Gala' apples is highly controllable by post-harvest treatments and is therefore a good candidate to explore the use of PTBs. We find that the RandomForest model is more consistent than an ElasticNet model and is predictive of firmness (r2 = 0.78) with as few as 15 genes. We also show that qPCR is reasonably consistent with RNA-seq in a follow up experiment. Results are promising for PTBs, yet more work is needed to ensure that PTBs are robust across various environmental conditions and storage treatments.


Asunto(s)
Malus , Humanos , Malus/genética , Frutas/genética , Transcriptoma , Proyectos Piloto , Perfilación de la Expresión Génica
3.
G3 (Bethesda) ; 14(3)2024 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-38190814

RESUMEN

Cultivated pear consists of several Pyrus species with Pyrus communis (European pear) representing a large fraction of worldwide production. As a relatively recently domesticated crop and perennial tree, pear can benefit from genome-assisted breeding. Additionally, comparative genomics within Rosaceae promises greater understanding of evolution within this economically important family. Here, we generate a fully phased chromosome-scale genome assembly of P. communis 'd'Anjou.' Using PacBio HiFi and Dovetail Omni-C reads, the genome is resolved into the expected 17 chromosomes, with each haplotype totaling nearly 540 Megabases and a contig N50 of nearly 14 Mb. Both haplotypes are highly syntenic to each other and to the Malus domestica 'Honeycrisp' apple genome. Nearly 45,000 genes were annotated in each haplotype, over 90% of which have direct RNA-seq expression evidence. We detect signatures of the known whole-genome duplication shared between apple and pear, and we estimate 57% of d'Anjou genes are retained in duplicate derived from this event. This genome highlights the value of generating phased diploid assemblies for recovering the full allelic complement in highly heterozygous crop species.


Asunto(s)
Malus , Pyrus , Pyrus/genética , Genoma de Planta , Fitomejoramiento , Malus/genética , Cromosomas
4.
Database (Oxford) ; 20232023 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-37971715

RESUMEN

Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL  https://www.agbiodata.org/databases.


Asunto(s)
Manejo de Datos , Fitomejoramiento , Animales , Genómica/métodos , Bases de Datos Factuales , Difusión de la Información
5.
BMC Genomics ; 23(1): 350, 2022 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-35524179

RESUMEN

BACKGROUND: Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. RESULTS: We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. CONCLUSIONS: Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species.


Asunto(s)
Adenocarcinoma del Pulmón , Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/patología , Biomarcadores , Biomarcadores de Tumor/genética , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Pulmón/patología , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Pronóstico
6.
BMC Bioinformatics ; 23(1): 156, 2022 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-35501696

RESUMEN

BACKGROUND: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. RESULTS: GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. CONCLUSIONS: Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , RNA-Seq , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos
7.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34850822

RESUMEN

Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The 'one-size-fits-all' approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.


Asunto(s)
Redes Reguladoras de Genes , Transcriptoma
8.
Front Plant Sci ; 12: 609684, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34220875

RESUMEN

Estimating maturity in pome fruits is a critical task that directs virtually all postharvest supply chain decisions. This is especially important for European pear (Pyrus communis) cultivars because losses due to spoilage and senescence must be minimized while ensuring proper ripening capacity is achieved (in part by satisfying a fruit chilling requirement). Reliable methods are lacking for accurate estimation of pear fruit maturity, and because ripening is maturity dependent it makes predicting ripening capacity a challenge. In this study of the European pear cultivar 'd'Anjou', we sorted fruit at harvest based upon on-tree fruit position to build contrasts of maturity. Our sorting scheme showed clear contrasts of maturity between canopy positions, yet there was substantial overlap in the distribution of values for the index of absorbance difference (I AD ), a non-destructive spectroscopic measurement that has been used as a proxy for pome fruit maturity. This presented an opportunity to explore a contrast of maturity that was more subtle than I AD could differentiate, and thus guided our subsequent transcriptome analysis of tissue samples taken at harvest and during storage. Using a novel approach that tests for condition-specific differences of co-expressed genes, we discovered genes with a phased character that mirrored our sorting scheme. The expression patterns of these genes are associated with fruit quality and ripening differences across the experiment. Functional profiles of these co-expressed genes are concordant with previous findings, and also offer new clues, and thus hypotheses, about genes involved in pear fruit quality, maturity, and ripening. This work may lead to new tools for enhanced postharvest management based on activity of gene co-expression modules, rather than individual genes. Further, our results indicate that modules may have utility within specific windows of time during postharvest management of 'd'Anjou' pear.

9.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34251419

RESUMEN

Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.


Asunto(s)
Cruzamiento , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Plantas/genética , Programas Informáticos , Productos Agrícolas/genética , Variación Genética , Filogenia , Plantas/metabolismo , Proteómica , Navegador Web
10.
BMC Genom Data ; 22(1): 17, 2021 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-34044788

RESUMEN

BACKGROUND: Gene expression is potentially an important heritable quantitative trait that mediates between genetic variation and higher-level complex phenotypes through time and condition-dependent regulatory interactions. Therefore, we sought to explore both the genomic and condition-specific characteristics of gene expression heritability within the context of chromosomal structure. RESULTS: Heritability was estimated for biological gene expression using a diverse, 84-line, Oryza sativa (rice) population under optimal and salt-stressed conditions. Overall, 5936 genes were found to have heritable expression regardless of condition and 1377 genes were found to have heritable expression only during salt stress. These genes with salt-specific heritable expression are enriched for functional terms associated with response to stimulus and transcription factor activity. Additionally, we discovered that highly and lowly expressed genes, and genes with heritable expression are distributed differently along the chromosomes in patterns that follow previously identified high-throughput chromosomal conformation capture (Hi-C) A/B chromatin compartments. Furthermore, multiple genomic hot-spots enriched for genes with salt-specific heritability were identified on chromosomes 1, 4, 6, and 8. These hotspots were found to contain genes functionally enriched for transcriptional regulation and overlaps with a previously identified major QTL for salt-tolerance in rice. CONCLUSIONS: Investigating the heritability of traits, and in-particular gene expression traits, is important towards developing a basic understanding of how regulatory networks behave across a population. This work provides insights into spatial patterns of heritable gene expression at the chromosomal level.


Asunto(s)
Cromosomas de las Plantas/genética , Regulación de la Expresión Génica de las Plantas , Genoma de Planta/genética , Oryza/genética , Estrés Salino/genética , Sitios de Carácter Cuantitativo/genética
11.
Front Big Data ; 4: 582468, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33748749

RESUMEN

Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA's GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN's properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN-we are working on extending and evaluating our pilot deployment and will present systematic results in a future work.

12.
Front Vet Sci ; 7: 559279, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33195534

RESUMEN

Specifically designed gene expression studies can be used to prioritize candidate genes and identify novel biomarkers affecting resilience against mastitis and other diseases in dairy cattle. The primary goal of this study was to assess whether specific peripheral leukocyte genes expressed differentially in a previous study of dairy cattle with postpartum disease, also would be expressed differentially in peripheral leukocytes from a diverse set of different dairy cattle with moderate to severe clinical mastitis. Four genes were selected for this study due to their differential expression in a previous transcriptomic analysis of circulating leukocytes from dairy cows with and without evidence of early postpartum disease. An additional 15 genes were included based on their cellular, immunologic, and inflammatory functions associated with resistance and tolerance to mastitis. This fixed cohort study was conducted on a conventional dairy in Washington state. Cows >50 days in milk (DIM) with mastitis (n = 12) were enrolled along with healthy cows (n = 8) selected to match the DIM and lactation numbers of mastitic cows. Blood was collected for a complete blood count (CBC), serum biochemistry, leukocyte isolation, and RNA extraction on the day of enrollment and twice more at 6 to 8-days intervals. Latent class analysis was performed to discriminate healthy vs. mastitic cows and to describe disease resolution. RNA samples were processed by the Primate Diagnostic Services Laboratory (University of Washington, Seattle, WA). Gene expression analysis was performed using the Nanostring System (Nanostring Technologies, Seattle, Washington, USA). Of the four genes (C5AR1, CATHL6, LCN2, and PGLYRP1) with evidence of upregulation in cows with mastitis, three of those genes (CATHL6, LCN2, and PGLYRP1) were investigated due to their previously identified association with postpartum disease. These genes are responsible for immunomodulatory molecules that selectively enhance or alter host innate immune defense mechanisms and modulate pathogen-induced inflammatory responses. Although further research is warranted to explain their functional mechanisms and bioactivity in cattle, our findings suggest that these conserved elements of innate immunity have the potential to bridge disease states and target tissues in diverse dairy populations.

13.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32621602

RESUMEN

Online biological databases housing genomics, genetic and breeding data can be constructed using the Tripal toolkit. Tripal is an open-source, internationally developed framework that implements FAIR data principles and is meant to ease the burden of constructing such websites for research communities. Use of a common, open framework improves the sustainability and manageability of such as site. Site developers can create extensions for their site and in turn share those extensions with others. One challenge that community databases often face is the need to provide tools for their users that analyze increasingly larger datasets using multiple software tools strung together in a scientific workflow on complicated computational resources. The Tripal Galaxy module, a 'plug-in' for Tripal, meets this need through integration of Tripal with the Galaxy Project workflow management system. Site developers can create workflows appropriate to the needs of their community using Galaxy and then share those for execution on their Tripal sites via automatically constructed, but configurable, web forms or using an application programming interface to power web-based analytical applications. The Tripal Galaxy module helps reduce duplication of effort by allowing site developers to spend time constructing workflows and building their applications rather than rebuilding infrastructure for job management of multi-step applications.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Internet , Programas Informáticos , Biología Computacional
14.
Int J Mol Sci ; 21(6)2020 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-32244875

RESUMEN

Lentil (Lens culinaris Medikus) is an important source of protein for people in developing countries. Aphanomyces root rot (ARR) has emerged as one of the most devastating diseases affecting lentil production. In this study, we applied two complementary quantitative trait loci (QTL) analysis approaches to unravel the genetic architecture underlying this complex trait. A recombinant inbred line (RIL) population and an association mapping population were genotyped using genotyping by sequencing (GBS) to discover novel single nucleotide polymorphisms (SNPs). QTL mapping identified 19 QTL associated with ARR resistance, while association mapping detected 38 QTL and highlighted accumulation of favorable haplotypes in most of the resistant accessions. Seven QTL clusters were discovered on six chromosomes, and 15 putative genes were identified within the QTL clusters. To validate QTL mapping and genome-wide association study (GWAS) results, expression analysis of five selected genes was conducted on partially resistant and susceptible accessions. Three of the genes were differentially expressed at early stages of infection, two of which may be associated with ARR resistance. Our findings provide valuable insight into the genetic control of ARR, and genetic and genomic resources developed here can be used to accelerate development of lentil cultivars with high levels of partial resistance to ARR.


Asunto(s)
Aphanomyces/fisiología , Mapeo Cromosómico , Resistencia a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Lens (Planta)/genética , Lens (Planta)/microbiología , Enfermedades de las Plantas/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Datos , Regulación de la Expresión Génica de las Plantas , Genética de Población , Haplotipos/genética , Desequilibrio de Ligamiento/genética , Fenotipo , Enfermedades de las Plantas/microbiología
15.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31688940

RESUMEN

Tripal is an open-source, resource-efficient toolkit for construction of genomic, genetic and breeding databases. It facilitates development of biological websites by providing tools to integrate and display biological data using the generic database schema, Chado, together with Drupal, a popular website creation and content management system. Tripal MapViewer is a new interactive tool for visualizing genetic map data. Developed as a Tripal replacement for Comparative Map Viewer (CMap), it enables visualization of entire maps or linkage groups and features such as molecular markers, quantitative trait loci (QTLs) and heritable phenotypic markers. It also provides graphical comparison of maps sharing the same markers as well as dot plot and correspondence matrices. MapViewer integrates directly with the Tripal application programming interface framework, improving data searching capability and providing a more seamless experience for site visitors. The Tripal MapViewer interface can be integrated in any Tripal map page and linked from any Tripal page for markers, QTLs, heritable morphological markers or genes. Configuration of the display is available through a control panel and the administration interface. The administration interface also allows configuration of the custom database query for building materialized views, providing better performance and flexibility in the way data is stored in the Chado database schema. MapViewer is implemented with the D3.js technology and is currently being used at the Genome Database for Rosaceae (https://www.rosaceae.org), CottonGen (https://www.cottongen.org), Citrus Genome Database (https://citrusgenomedb.org), Vaccinium Genome Database (https://www.vaccinium.org) and Cool Season Food Legume Database (https://www.coolseasonfoodlegume.org). It is also currently in development on the Hardwood Genomics Web (https://hardwoodgenomics.org) and TreeGenes (https://treegenesdb.org). Database URL: https://gitlab.com/mainlabwsu/tripal_map.


Asunto(s)
Bases de Datos Genéticas , Genoma de Planta , Internet , Sitios de Carácter Cuantitativo , Rosaceae/genética , Interfaz Usuario-Computador , Genómica
16.
Front Plant Sci ; 10: 813, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31293610

RESUMEN

Despite tremendous advancements in high throughput sequencing, the vast majority of tree genomes, and in particular, forest trees, remain elusive. Although primary databases store genetic resources for just over 2,000 forest tree species, these are largely focused on sequence storage, basic genome assemblies, and functional assignment through existing pipelines. The tree databases reviewed here serve as secondary repositories for community data. They vary in their focal species, the data they curate, and the analytics provided, but they are united in moving toward a goal of centralizing both data access and analysis. They provide frameworks to view and update annotations for complex genomes, interrogate systems level expression profiles, curate data for comparative genomics, and perform real-time analysis with genotype and phenotype data. The organism databases of today are no longer simply catalogs or containers of genetic information. These repositories represent integrated cyberinfrastructure that support cross-site queries and analysis in web-based environments. These resources are striving to integrate across diverse experimental designs, sequence types, and related measures through ontologies, community standards, and web services. Efficient, simple, and robust platforms that enhance the data generated by the research community, contribute to improving forest health and productivity.

17.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31328773

RESUMEN

Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User's Guide and Developer's Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.


Asunto(s)
Biota/genética , Bases de Datos Genéticas , Difusión de la Información , Internet , Programas Informáticos , Transcriptoma , Genómica
18.
Methods Mol Biol ; 1962: 29-51, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31020553

RESUMEN

The Genome Sequence Annotation Server (GenSAS, https://www.gensas.org ) is a secure, web-based genome annotation platform for structural and functional annotation, as well as manual curation. Requiring no installation by users, GenSAS integrates popular command line-based, annotation tools under a single, easy-to-use, online interface. GenSAS integrates JBrowse and Apollo, so users can view annotation data and manually curate gene models. Users are guided step by step through the annotation process by embedded instructions and a more in-depth GenSAS User's Guide. In addition to a genome assembly file, users can also upload organism-specific transcript, protein, and RNA-seq read evidence for use in the annotation process. The latest versions of the NCBI RefSeq transcript and protein databases and the SwissProt and TrEMBL protein databases are provided for all users. GenSAS projects can be shared with other GenSAS users enabling collaborative annotation. Once annotation is complete, GenSAS generates the final files of the annotated gene models in common file formats for use with other annotation tools, submission to a repository, and use in publications.


Asunto(s)
Bases de Datos Genéticas , Genoma , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Curaduría de Datos , Eucariontes , Internet , Análisis de Secuencia de ARN , Especificidad de la Especie , Interfaz Usuario-Computador
20.
Nucleic Acids Res ; 47(D1): D1137-D1145, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357347

RESUMEN

The Genome Database for Rosaceae (GDR, https://www.rosaceae.org) is an integrated web-based community database resource providing access to publicly available genomics, genetics and breeding data and data-mining tools to facilitate basic, translational and applied research in Rosaceae. The volume of data in GDR has increased greatly over the last 5 years. The GDR now houses multiple versions of whole genome assembly and annotation data from 14 species, made available by recent advances in sequencing technology. Annotated and searchable reference transcriptomes, RefTrans, combining peer-reviewed published RNA-Seq as well as EST datasets, are newly available for major crop species. Significantly more quantitative trait loci, genetic maps and markers are available in MapViewer, a new visualization tool that better integrates with other pages in GDR. Pathways can be accessed through the new GDR Cyc Pathways databases, and synteny among the newest genome assemblies from eight species can be viewed through the new synteny browser, SynView. Collated single-nucleotide polymorphism diversity data and phenotypic data from publicly available breeding datasets are integrated with other relevant data. Also, the new Breeding Information Management System allows breeders to upload, manage and analyze their private breeding data within the secure GDR server with an option to release data publicly.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genoma de Planta/genética , Genómica/métodos , Rosaceae/genética , Biología Computacional/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Genes de Plantas/genética , Almacenamiento y Recuperación de la Información/métodos , Internet , Fitomejoramiento/métodos , Sitios de Carácter Cuantitativo/genética , Rosaceae/clasificación , Especificidad de la Especie , Sintenía , Factores de Tiempo , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA