Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Methods Mol Biol ; 2443: 415-427, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35037218

RESUMEN

Next generation sequencing technologies enabled high-density genotyping for large numbers of samples. Nowadays SNP calling pipelines produce up to millions of such markers, but which need to be filtered in various ways according to the type of analyses. One of the main challenges still lies in the management of an increasing volume of genotyping files that are difficult to handle for many applications. Here, we provide a practical guide for efficiently managing large genomic variation data using Gigwa, a user-friendly, scalable and versatile application that may be deployed either remotely on web servers or on a local machine.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Genómica , Genotipo , Técnicas de Genotipaje , Polimorfismo de Nucleótido Simple
2.
Methods Mol Biol ; 2443: 527-540, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35037225

RESUMEN

Recent advances in high-throughput technologies have resulted in tremendous increase in the amount of data in the agronomic domain. There is an urgent need to effectively integrate complementary information to understand the biological system in its entirety. We have developed AgroLD, a knowledge graph that exploits the Semantic Web technology and some of the relevant standard domain ontologies, to integrate information on plant species and in this way facilitating the formulation of new scientific hypotheses. This chapter outlines some integration results of the project, which initially focused on genomics, proteomics and phenomics.


Asunto(s)
Genómica , Reconocimiento de Normas Patrones Automatizadas , Bases de Datos Factuales , Genómica/métodos , Plantas/genética , Proteómica
3.
Genomics Inform ; 19(3): e27, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34638174

RESUMEN

Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pre-trained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

4.
Bioinformatics ; 37(7): 1037-1038, 2021 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-32735312

RESUMEN

SUMMARY: Currently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programing API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system, which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results. AVAILABILITY AND IMPLEMENTATION: https://github.com/SouthGreenPlatform/PyRice. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Oryza , Programas Informáticos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Oryza/genética
5.
Front Public Health ; 8: 563247, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33072700

RESUMEN

Since its emergence in China, the COVID-19 pandemic has spread rapidly around the world. Faced with this unknown disease, public health authorities were forced to experiment, in a short period of time, with various combinations of interventions at different scales. However, as the pandemic progresses, there is an urgent need for tools and methodologies to quickly analyze the effectiveness of responses against COVID-19 in different communities and contexts. In this perspective, computer modeling appears to be an invaluable lever as it allows for the in silico exploration of a range of intervention strategies prior to the potential field implementation phase. More specifically, we argue that, in order to take into account important dimensions of policy actions, such as the heterogeneity of the individual response or the spatial aspect of containment strategies, the branch of computer modeling known as agent-based modeling is of immense interest. We present in this paper an agent-based modeling framework called COVID-19 Modeling Kit (COMOKIT), designed to be generic, scalable and thus portable in a variety of social and geographical contexts. COMOKIT combines models of person-to-person and environmental transmission, a model of individual epidemiological status evolution, an agenda-based 1-h time step model of human mobility, and an intervention model. It is designed to be modular and flexible enough to allow modelers and users to represent different strategies and study their impacts in multiple social, epidemiological or economic scenarios. Several large-scale experiments are analyzed in this paper and allow us to show the potentialities of COMOKIT in terms of analysis and comparison of the impacts of public health policies in a realistic case study.


Asunto(s)
COVID-19 , Pandemias , China/epidemiología , Ciudades , Humanos , SARS-CoV-2
6.
Genomics Inform ; 18(2): e19, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32634873

RESUMEN

In semantic annotation, semantic concepts are linked to natural language. Semantic annotation helps in boosting the ability to search and access resources and can be used in information retrieval systems to augment the queries from the user. In the research described in this paper, we aimed to identify ontological concepts in scientific text contained in spreadsheets. We developed a tool that can handle various types of spreadsheets. Furthermore, we used the NCBO Annotator API provided by BioPortal to enhance the semantic annotation functionality to cover spreadsheet data. Table2Annotation has strengths in certain criteria such as speed, error handling, and complex concept matching.

8.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31508797

RESUMEN

MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. RESULTS: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix. AVAILABILITY: http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse.


Asunto(s)
Bases de Datos Genéticas , Genómica , Genotipo , Técnicas de Genotipaje , Almacenamiento y Recuperación de la Información , Programas Informáticos
9.
Genomics Inform ; 17(2): e17, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-31307132

RESUMEN

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

10.
Gigascience ; 8(5)2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31107941

RESUMEN

BACKGROUND: Rice molecular genetics, breeding, genetic diversity, and allied research (such as rice-pathogen interaction) have adopted sequencing technologies and high-density genotyping platforms for genome variation analysis and gene discovery. Germplasm collections representing rice diversity, improved varieties, and elite breeding materials are accessible through rice gene banks for use in research and breeding, with many having genome sequences and high-density genotype data available. Combining phenotypic and genotypic information on these accessions enables genome-wide association analysis, which is driving quantitative trait loci discovery and molecular marker development. Comparative sequence analyses across quantitative trait loci regions facilitate the discovery of novel alleles. Analyses involving DNA sequences and large genotyping matrices for thousands of samples, however, pose a challenge to non-computer savvy rice researchers. FINDINGS: The Rice Galaxy resource has shared datasets that include high-density genotypes from the 3,000 Rice Genomes project and sequences with corresponding annotations from 9 published rice genomes. The Rice Galaxy web server and deployment installer includes tools for designing single-nucleotide polymorphism assays, analyzing genome-wide association studies, population diversity, rice-bacterial pathogen diagnostics, and a suite of published genomic prediction methods. A prototype Rice Galaxy compliant to Open Access, Open Data, and Findable, Accessible, Interoperable, and Reproducible principles is also presented. CONCLUSIONS: Rice Galaxy is a freely available resource that empowers the plant research community to perform state-of-the-art analyses and utilize publicly available big datasets for both fundamental and applied science.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Oryza/genética , Fitomejoramiento/métodos , Programas Informáticos , Banco de Semillas
11.
Gigascience ; 8(5)2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31077313

RESUMEN

BACKGROUND: The study of genetic variations is the basis of many research domains in biology. From genome structure to population dynamics, many applications involve the use of genetic variants. The advent of next-generation sequencing technologies led to such a flood of data that the daily work of scientists is often more focused on data management than data analysis. This mass of genotyping data poses several computational challenges in terms of storage, search, sharing, analysis, and visualization. While existing tools try to solve these challenges, few of them offer a comprehensive and scalable solution. RESULTS: Gigwa v2 is an easy-to-use, species-agnostic web application for managing and exploring high-density genotyping data. It can handle multiple databases and may be installed on a local computer or deployed as an online data portal. It supports various standard import and export formats, provides advanced filtering options, and offers means to visualize density charts or push selected data into various stand-alone or online tools. It implements 2 standard RESTful application programming interfaces, GA4GH, which is health-oriented, and BrAPI, which is breeding-oriented, thus offering wide possibilities of interaction with third-party applications. The project home page provides a list of live instances allowing users to test the system on public data (or reasonably sized user-provided data). CONCLUSIONS: This new version of Gigwa provides a more intuitive and more powerful way to explore large amounts of genotyping data by offering a scalable solution to search for genotype patterns, functional annotations, or more complex filtering. Furthermore, its user-friendliness and interoperability make it widely accessible to the life science community.


Asunto(s)
Biología Computacional , Genómica , Genotipo , Programas Informáticos , Bases de Datos Genéticas , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Internet , Polimorfismo de Nucleótido Simple/genética , Interfaz Usuario-Computador
12.
Bioinformatics ; 35(20): 4147-4155, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-30903186

RESUMEN

MOTIVATION: Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. RESULTS: To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases. AVAILABILITY AND IMPLEMENTATION: More information on BrAPI, including links to the specification, test suites, BrAPPs, and sample implementations is available at https://brapi.org/. The BrAPI specification and the developer tools are provided as free and open source.


Asunto(s)
Fitomejoramiento , Programas Informáticos , Interfaz Usuario-Computador , Genómica
13.
Brief Bioinform ; 20(2): 565-571, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-29659709

RESUMEN

Improving productivity of the staple crops wheat and rice is essential to feed the growing global population, particularly in the context of a changing climate. However, current rates of yield gain are insufficient to support the predicted population growth. New approaches are required to accelerate the breeding process, and many of these are driven by the application of large-scale crop data. To leverage the substantial volumes and types of data that can be applied for precision breeding, the wheat and rice research communities are working towards the development of integrated systems to access and standardize the dispersed, heterogeneous available data. Here, we outline the initiatives of the International Wheat Information System (WheatIS) and the International Rice Informatics Consortium (IRIC) to establish Web-based single-access systems and data mining tools to make the available resources more accessible, drive discovery and accelerate the production of new crop varieties. We discuss the progress of WheatIS and IRIC towards unifying specialized wheat and rice databases and building custom software platforms to manage and interrogate these data. Single-access crop information systems will strengthen scientific collaboration, optimize the use of public research funds and help achieve the required yield gains in the two most important global food crops.


Asunto(s)
Productos Agrícolas/crecimiento & desarrollo , Sistemas de Información , Oryza/crecimiento & desarrollo , Triticum/crecimiento & desarrollo
14.
PLoS One ; 13(11): e0198270, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30500839

RESUMEN

Recent advances in high-throughput technologies have resulted in a tremendous increase in the amount of omics data produced in plant science. This increase, in conjunction with the heterogeneity and variability of the data, presents a major challenge to adopt an integrative research approach. We are facing an urgent need to effectively integrate and assimilate complementary datasets to understand the biological system as a whole. The Semantic Web offers technologies for the integration of heterogeneous data and their transformation into explicit knowledge thanks to ontologies. We have developed the Agronomic Linked Data (AgroLD- www.agrold.org), a knowledge-based system relying on Semantic Web technologies and exploiting standard domain ontologies, to integrate data about plant species of high interest for the plant science community e.g., rice, wheat, arabidopsis. We present some integration results of the project, which initially focused on genomics, proteomics and phenomics. AgroLD is now an RDF (Resource Description Format) knowledge base of 100M triples created by annotating and integrating more than 50 datasets coming from 10 data sources-such as Gramene.org and TropGeneDB-with 10 ontologies-such as the Gene Ontology and Plant Trait Ontology. Our evaluation results show users appreciate the multiple query modes which support different use cases. AgroLD's objective is to offer a domain specific knowledge platform to solve complex biological and agronomical questions related to the implication of genes/proteins in, for instances, plant disease resistance or high yield traits. We expect the resolution of these questions to facilitate the formulation of new scientific hypotheses to be validated with a knowledge-oriented approach.


Asunto(s)
Agricultura , Genómica , Bases del Conocimiento , Proteómica , Genoma de Planta
15.
Database (Oxford) ; 20182018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30239679

RESUMEN

The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.


Asunto(s)
Agricultura , Bases de Datos Genéticas , Genómica , Cruzamiento , Ontología de Genes , Metadatos , Encuestas y Cuestionarios
16.
Curr Biol ; 28(14): 2274-2282.e6, 2018 07 23.
Artículo en Inglés | MEDLINE | ID: mdl-29983312

RESUMEN

African rice (Oryza glaberrima) was domesticated independently from Asian rice. The geographical origin of its domestication remains elusive. Using 246 new whole-genome sequences, we inferred the cradle of its domestication to be in the Inner Niger Delta. Domestication was preceded by a sharp decline of most wild populations that started more than 10,000 years ago. The wild population collapse occurred during the drying of the Sahara. This finding supports the hypothesis that depletion of wild resources in the Sahara triggered African rice domestication. African rice cultivation strongly expanded 2,000 years ago. During the last 5 centuries, a sharp decline of its cultivation coincided with the introduction of Asian rice in Africa. A gene, PROG1, associated with an erect plant architecture phenotype, showed convergent selection in two rice cultivated species, Oryza glaberrima from Africa and Oryza sativa from Asia. In contrast, a shattering gene, SH5, showed selection signature during African rice domestication, but not during Asian rice domestication. Overall, our genomic data revealed a complex history of African rice domestication influenced by important climatic changes in the Saharan area, by the expansion of African agricultural society, and by recent replacement by another domesticated species.


Asunto(s)
Productos Agrícolas/genética , Domesticación , Genoma de Planta , Oryza/genética , África , Cambio Climático , Dinámica Poblacional
17.
F1000Res ; 6: 1843, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29333241

RESUMEN

In this article, we present a joint effort of the wheat research community, along with data and ontology experts, to develop wheat data interoperability guidelines. Interoperability is the ability of two or more systems and devices to cooperate and exchange data, and interpret that shared information. Interoperability is a growing concern to the wheat scientific community, and agriculture in general, as the need to interpret the deluge of data obtained through high-throughput technologies grows. Agreeing on common data formats, metadata, and vocabulary standards is an important step to obtain the required data interoperability level in order to add value by encouraging data sharing, and subsequently facilitate the extraction of new information from existing and new datasets. During a period of more than 18 months, the RDA Wheat Data Interoperability Working Group (WDI-WG) surveyed the wheat research community about the use of data standards, then discussed and selected a set of recommendations based on consensual criteria. The recommendations promote standards for data types identified by the wheat research community as the most important for the coming years: nucleotide sequence variants, genome annotations, phenotypes, germplasm data, gene expression experiments, and physical maps. For each of these data types, the guidelines recommend best practices in terms of use of data formats, metadata standards and ontologies. In addition to the best practices, the guidelines provide examples of tools and implementations that are likely to facilitate the adoption of the recommendations. To maximize the adoption of the recommendations, the WDI-WG used a community-driven approach that involved the wheat research community from the start, took into account their needs and practices, and provided them with a framework to keep the recommendations up to date. We also report this approach's potential to be generalizable to other (agricultural) domains.

19.
Gigascience ; 5: 25, 2016 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-27267926

RESUMEN

BACKGROUND: Exploring the structure of genomes and analyzing their evolution is essential to understanding the ecological adaptation of organisms. However, with the large amounts of data being produced by next-generation sequencing, computational challenges arise in terms of storage, search, sharing, analysis and visualization. This is particularly true with regards to studies of genomic variation, which are currently lacking scalable and user-friendly data exploration solutions. DESCRIPTION: Here we present Gigwa, a web-based tool that provides an easy and intuitive way to explore large amounts of genotyping data by filtering it not only on the basis of variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability properties. Gigwa can handle multiple databases and may be deployed in either single- or multi-user mode. In addition, it provides a wide range of popular export formats. CONCLUSIONS: The Gigwa application is suitable for managing large amounts of genomic variation data. Its user-friendly web interface makes such processing widely accessible. It can either be simply deployed on a workstation or be used to provide a shared data portal for a given community of researchers.


Asunto(s)
Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Análisis de Secuencia de ADN/métodos , Bases de Datos Genéticas , Variación Genética , Genotipo , Almacenamiento y Recuperación de la Información , Internet , Programas Informáticos , Interfaz Usuario-Computador
20.
BMC Plant Biol ; 13: 122, 2013 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-23987653

RESUMEN

BACKGROUND: In crops, inflorescence complexity and the shape and size of the seed are among the most important characters that influence yield. For example, rice panicles vary considerably in the number and order of branches, elongation of the axis, and the shape and size of the seed. Manual low-throughput phenotyping methods are time consuming, and the results are unreliable. However, high-throughput image analysis of the qualitative and quantitative traits of rice panicles is essential for understanding the diversity of the panicle as well as for breeding programs. RESULTS: This paper presents P-TRAP software (Panicle TRAit Phenotyping), a free open source application for high-throughput measurements of panicle architecture and seed-related traits. The software is written in Java and can be used with different platforms (the user-friendly Graphical User Interface (GUI) uses Netbeans Platform 7.3). The application offers three main tools: a tool for the analysis of panicle structure, a spikelet/grain counting tool, and a tool for the analysis of seed shape. The three tools can be used independently or simultaneously for analysis of the same image. Results are then reported in the Extensible Markup Language (XML) and Comma Separated Values (CSV) file formats. Images of rice panicles were used to evaluate the efficiency and robustness of the software. Compared to data obtained by manual processing, P-TRAP produced reliable results in a much shorter time. In addition, manual processing is not repeatable because dry panicles are vulnerable to damage. The software is very useful, practical and collects much more data than human operators. CONCLUSIONS: P-TRAP is a new open source software that automatically recognizes the structure of a panicle and the seeds on the panicle in numeric images. The software processes and quantifies several traits related to panicle structure, detects and counts the grains, and measures their shape parameters. In short, P-TRAP offers both efficient results and a user-friendly environment for experiments. The experimental results showed very good accuracy compared to field operator, expert verification and well-known academic methods.


Asunto(s)
Oryza/anatomía & histología , Oryza/crecimiento & desarrollo , Programas Informáticos , Inflorescencia/anatomía & histología , Inflorescencia/crecimiento & desarrollo , Fenotipo , Sitios de Carácter Cuantitativo , Semillas/anatomía & histología , Semillas/crecimiento & desarrollo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...