Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 118
Filtrar
1.
Methods Mol Biol ; 2703: 3-22, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37646933

RESUMEN

The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. However, although many established infrastructures provide comprehensive and long-term stable services and platforms, a large quantity of research data is still hidden. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, for example, time series of images or high-resolution hyperspectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institutional boundaries. To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements an on-premise approach, which allows research data to be kept in place and wrapped in FAIR-aware software infrastructure. In this chapter, the e!DAL infrastructure software and the PGP repository are presented as best practice on how to easily setup FAIR-compliant and intuitive research data services.


Asunto(s)
Genómica , Fenómica , Manejo de Datos , Bases de Datos Factuales , Alemania
2.
Sci Data ; 9(1): 784, 2022 12 26.
Artículo en Inglés | MEDLINE | ID: mdl-36572688

RESUMEN

Plant genetic resources (PGR) stored at genebanks are humanity's crop diversity savings for the future. Information on PGR contrasted with modern cultivars is key to select PGR parents for pre-breeding. Genotyping-by-sequencing was performed for 7,745 winter wheat PGR samples from the German Federal ex situ genebank at IPK Gatersleben and for 325 modern cultivars. Whole-genome shotgun sequencing was carried out for 446 diverse PGR samples and 322 modern cultivars and lines. In 19 field trials, 7,683 PGR and 232 elite cultivars were characterized for resistance to yellow rust - one of the major threats to wheat worldwide. Yield breeding values of 707 PGR were estimated using hybrid crosses with 36 cultivars - an approach that reduces the lack of agronomic adaptation of PGR and provides better estimates of their contribution to yield breeding. Cross-validations support the interoperability between genomic and phenotypic data. The here presented data are a stepping stone to unlock the functional variation of PGR for European pre-breeding and are the basis for future breeding and research activities.


Asunto(s)
Fitomejoramiento , Triticum , Genotipo , Estaciones del Año , Triticum/genética
3.
Nat Genet ; 54(10): 1544-1552, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36195758

RESUMEN

The great efforts spent in the maintenance of past diversity in genebanks are rationalized by the potential role of plant genetic resources (PGR) in future crop improvement-a concept whose practical implementation has fallen short of expectations. Here, we implement a genomics-informed prebreeding strategy for wheat improvement that does not discriminate against nonadapted germplasm. We collect and analyze dense genetic profiles for a large winter wheat collection and evaluate grain yield and resistance to yellow rust (YR) in bespoke core sets. Breeders already profit from wild introgressions but PGR still offer useful, yet unused, diversity. Potential donors of resistance sources not yet deployed in breeding were detected, while the prebreeding contribution of PGR to yield was estimated through 'Elite × PGR' F1 crosses. Genomic prediction within and across genebanks identified the best parents to be used in crosses with elite cultivars whose advanced progenies can outyield current wheat varieties in multiple field trials.


Asunto(s)
Fitomejoramiento , Triticum , Genómica , Plantas , Triticum/genética
4.
J Integr Bioinform ; 19(4)2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36065132

RESUMEN

Over the last years it has been observed that the progress in data collection in life science has created increasing demand and opportunities for advanced bioinformatics. This includes data management as well as the individual data analysis and often covers the entire data life cycle. A variety of tools have been developed to store, share, or reuse the data produced in the different domains such as genotyping. Especially imputation, as a subfield of genotyping, requires good Research Data Management (RDM) strategies to enable use and re-use of genotypic data. To aim for sustainable software, it is necessary to develop tools and surrounding ecosystems, which are reusable and maintainable. Reusability in the context of streamlined tools can e.g. be achieved by standardizing the input and output of the different tools and adapting to open and broadly used file formats. By using such established file formats, the tools can also be connected with others, improving the overall interoperability of the software. Finally, it is important to build strong communities that maintain the tools by developing and contributing new features and maintenance updates. In this article, concepts for this will be presented for an imputation service.


Asunto(s)
Biología Computacional , Ecosistema , Genotipo , Programas Informáticos
5.
F1000Res ; 112022.
Artículo en Inglés | MEDLINE | ID: mdl-35811804

RESUMEN

In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of (meta-) data in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. VCF files are an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant call data (for example, the HapMap format and the gVCF format), but none currently have the reach of VCF. In VCF, only the sites of variation are described, whereas in gVCF, all positions are listed, and confidence values are also provided. For the sake of simplicity, we will only discuss VCF and our recommendations for its use. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse (if any) descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from the plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.


Asunto(s)
Metadatos , Programas Informáticos , Genotipo
6.
Plant J ; 111(2): 335-347, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35535481

RESUMEN

The research data life cycle from project planning to data publishing is an integral part of current research. Until the last decade, researchers were responsible for all associated phases in addition to the actual research and were assisted only at certain points by IT or bioinformaticians. Starting with advances in sequencing, the automation of analytical methods in all life science fields, including in plant phenotyping, has led to ever-increasing amounts of ever more complex data. The tasks associated with these challenges now often exceed the expertise of and infrastructure available to scientists, leading to an increased risk of data loss over time. The IPK Gatersleben has one of the world's largest germplasm collections and two decades of experience in crop plant research data management. In this article we show how challenges in modern, data-driven research can be addressed by data stewards. Based on concrete use cases, data management processes and best practices from plant phenotyping, we describe which expertise and skills are required and how data stewards as an integral actor can enhance the quality of a necessary digital transformation in progressive research.


Asunto(s)
Macrodatos , Fenómica , Plantas , Productos Agrícolas/genética , Plantas/genética
7.
Gigascience ; 122022 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-37083938

RESUMEN

BACKGROUND: The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable to the scientific community. FINDINGS: We present DivBrowse, a web application for interactive visualization and exploratory analysis of genomic diversity data stored in Variant Call Format (VCF) files of any size. By seamlessly combining BLAST as an entry point together with interactive data analysis features such as principal component analysis in one graphical user interface, DivBrowse provides a novel and unique set of exploratory data analysis capabilities for genomic biodiversity datasets. The capability to integrate DivBrowse into existing web applications supports interoperability between different web applications. Built-in interactive computation of principal component analysis allows users to perform ad hoc analysis of the population structure based on specific genetic elements such as genes and exons. Data interoperability is supported by the ability to export genomic diversity data in VCF and General Feature Format 3 files. CONCLUSION: DivBrowse offers a novel approach for interactive visualization and analysis of genomic diversity data and optionally also gene annotation data by including features like interactive calculation of variant frequencies and principal component analysis. The use of established standard file formats for data input supports interoperability and seamless deployment of application instances based on the data output of established bioinformatics pipelines.


Asunto(s)
Genómica , Programas Informáticos , Biología Computacional , Genoma , Anotación de Secuencia Molecular
8.
F1000Res ; 11: 12, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36636476

RESUMEN

With the ongoing cost decrease of genotyping and sequencing technologies, accurate and fast phenotyping remains the bottleneck in the utilizing of plant genetic resources for breeding and breeding research. Although cost-efficient high-throughput phenotyping platforms are emerging for specific traits and/or species, manual phenotyping is still widely used and is a time- and money-consuming step. Approaches that improve data recording, processing or handling are pivotal steps towards the efficient use of genetic resources and are demanded by the research community. Therefore, we developed PhenoApp, an open-source Android app for tablets and smartphones to facilitate the digital recording of phenotypical data in the field and in greenhouses. It is a versatile tool that offers the possibility to fully customize the descriptors/scales for any possible scenario, also in accordance with international information standards such as MIAPPE (Minimum Information About a Plant Phenotyping Experiment) and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Furthermore, PhenoApp enables the use of pre-integrated ready-to-use BBCH (Biologische Bundesanstalt für Land- und Forstwirtschaft, Bundessortenamt und CHemische Industrie) scales for apple, cereals, grapevine, maize, potato, rapeseed and rice. Additional BBCH scales can easily be added. The simple and adaptable structure of input and output files enables an easy data handling by either spreadsheet software or even the integration in the workflow of laboratory information management systems (LIMS). PhenoApp is therefore a decisive contribution to increase efficiency of digital data acquisition in genebank management but also contributes to breeding and breeding research by accelerating the labour intensive and time-consuming acquisition of phenotyping data.


Asunto(s)
Fitomejoramiento , Plantas , Programas Informáticos , Fenotipo
9.
Gigascience ; 10(12)2021 12 29.
Artículo en Inglés | MEDLINE | ID: mdl-34966925

RESUMEN

BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.


Asunto(s)
Minería de Datos , Nucleótidos , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Europa (Continente)
10.
Fukushima J Med Sci ; 67(2): 89-93, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34456223

RESUMEN

This paper reports on the IAEA's Consultancy Meeting on "low-dose radiation for patients and population -Science, Technology and Society (STS) concepts for communication and perception among medical doctors and stakeholders-", which was held on October 21 and 22, 2020. The meeting consisted of seven presentation sessions, with a total of 27 presentations and 39 participants from seven countries. The meeting focused on various areas including environmental, food, and personal dosimetry;radiation and other secondary health effects after nuclear disasters;communication between medical professionals and patients or residents;and medical education on nuclear accidents. This meeting was convened to discuss STS perspectives related to nuclear emergencies, to share the findings of the Fukushima Health Management Survey and the current situation in Fukushima with international experts. The meeting confirmed the importance of coordinated recovery of affected areas and global preparedness in the aftermath of nuclear accidents.

11.
Sci Adv ; 7(24)2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34117061

RESUMEN

The potential of big data to support businesses has been demonstrated in financial services, manufacturing, and telecommunications. Here, we report on efforts to enter a new data era in plant breeding by collecting genomic and phenotypic information from 12,858 wheat genotypes representing 6575 single-cross hybrids and 6283 inbred lines that were evaluated in six experimental series for yield in field trials encompassing ~125,000 plots. Integrating data resulted in twofold higher prediction ability compared with cases in which hybrid performance was predicted across individual experimental series. Our results suggest that combining data across breeding programs is a particularly appropriate strategy to exploit the potential of big data for predictive plant breeding. This paradigm shift can contribute to increasing yield and resilience, which is needed to feed the growing world population.

12.
Plant Cell ; 33(6): 1888-1906, 2021 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-33710295

RESUMEN

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Hordeum/genética , Biología Computacional/métodos , ADN Intergénico , Genoma de Planta , Anotación de Secuencia Molecular , Retroelementos , Análisis de Secuencia de ADN , Secuencias Repetidas Terminales
14.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33589928

RESUMEN

This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.


Asunto(s)
Manejo de Datos/métodos , Metadatos , Redes Neurales de la Computación , Proteómica/métodos , Programas Informáticos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Cooperación Internacional , Fenotipo , Plantas/genética , Proteoma , Autoevaluación (Psicología) , Flujo de Trabajo
15.
Nature ; 588(7837): 284-289, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33239781

RESUMEN

Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop species (known as the 'pan-genome'1). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions2. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley-comprising landraces, cultivars and a wild barley-that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding.


Asunto(s)
Cromosomas de las Plantas/genética , Genoma de Planta/genética , Hordeum/genética , Internacionalidad , Mutación , Fitomejoramiento , Inversión Cromosómica/genética , Mapeo Cromosómico , Sitios Genéticos/genética , Genotipo , Hordeum/clasificación , Polimorfismo Genético/genética , Estándares de Referencia , Banco de Semillas , Inversión de Secuencia , Secuenciación Completa del Genoma
16.
Sci Rep ; 10(1): 19230, 2020 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33154426

RESUMEN

Duckweeds are small, free-floating, morphologically highly reduced organisms belonging to the monocot order Alismatales. They display the most rapid growth among flowering plants, vary ~ 14-fold in genome size and comprise five genera. Spirodela is the phylogenetically oldest genus with only two mainly asexually propagating species: S. polyrhiza (2n = 40; 160 Mbp/1C) and S. intermedia (2n = 36; 160 Mbp/1C). This study combined comparative cytogenetics and de novo genome assembly based on PacBio, Illumina and Oxford Nanopore (ON) reads to obtain the first genome reference for S. intermedia and to compare its genomic features with those of the sister species S. polyrhiza. Both species' genomes revealed little more than 20,000 putative protein-coding genes, very low rDNA copy numbers and a low amount of repetitive sequences, mainly Ty3/gypsy retroelements. The detection of a few new small chromosome rearrangements between both Spirodela species refined the karyotype and the chromosomal sequence assignment for S. intermedia.


Asunto(s)
Araceae/genética , Cromosomas de las Plantas , Genoma de Planta , Mapeo Cromosómico , Cariotipo , Cariotipificación , Nanoporos
17.
Gigascience ; 9(10)2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-33090199

RESUMEN

BACKGROUND: The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. RESULTS: To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a "bring the infrastructure to the data" approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. CONCLUSION: The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.


Asunto(s)
Difusión de la Información , Programas Informáticos , Bases de Datos Factuales , Genómica , Plantas
19.
Sci Adv ; 6(24): eaay4897, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32582844

RESUMEN

The genetics underlying heterosis, the difference in performance of crosses compared with midparents, is hypothesized to vary with relatedness between parents. We established a unique germplasm comprising three hybrid wheat sets differing in the degree of divergence between parents and devised a genetic distance measure giving weight to heterotic loci. Heterosis increased steadily with heterotic genetic distance for all 1903 hybrids. Midparent heterosis, however, was significantly lower in the hybrids including crosses between elite and exotic lines than in crosses among elite lines. The analysis of the genetic architecture of heterosis revealed this to be caused by a higher portion of negative dominance and dominance-by-dominance epistatic effects. Collectively, these results expand our understanding of heterosis in crops, an important pillar toward global food security.

20.
Front Plant Sci ; 11: 701, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32595658

RESUMEN

Genebanks harbor a large treasure trove of untapped plant genetic diversity. A growing world population and a changing climate require an increase in the production and development of stress resistant plant cultivars while decreasing the acreage. These requirements for improved plant cultivars can be supported by the broader exploitation of plant genetic resources (PGR) as inputs for genomics-assisted breeding. To support this process we have developed BRIDGE, a data warehouse and exploratory data analysis tool for genebank genomics of barley (Hordeum vulgare L.). Using efficient technologies for data storage, data transfer and web development, we facilitate access to digital genebank resources of barley by prioritizing the interactive and visual analysis of integrated genotypic and phenotypic data. The underlying data resulted from a barley genebank genomics study cataloging sequence and morphological data of 22,626 barley accessions, mainly from the German Federal ex situ genebank. BRIDGE consists of interactively coupled modules to visualize integrated, curated and quality checked data, such as variation data, results of dimensionality reduction and genome wide association studies (GWAS), phenotyping results, passport data as well as the geographic distribution of germplasm samples. The core component is a manager for custom collections of germplasm. A search module to find and select germplasm by passport and phenotypic attributes is included as well as modules to export genotypic data in gzip-compressed variant call format (VCF) files and phenotypic data in MIAPPE-compliant ISA-Tab files. BRIDGE is accessible at the following URL: https://bridge.ipk-gatersleben.de.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA