Pesquisa | BVS IEC

Core Hunter 3: flexible core subset selection.

De Beukelaer, Herman; Davenport, Guy F; Fack, Veerle.

BMC Bioinformatics ; 19(1): 203, 2018 05 31.

Artigo em Inglês | MEDLINE | ID: mdl-29855322

RESUMO

BACKGROUND: Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness. RESULTS: In version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli. CONCLUSIONS: In version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the strengths of other methods, as it (simultaneously) optimizes a variety of metrics. In addition, CH3 is an improvement over CH2, with the option to use genetic marker data or phenotypic traits, or both, and improved speed. Core Hunter 3 is freely available on http://www.corehunter.org .

Assuntos

Variação Genética/genética , Algoritmos , Humanos

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search.

De Beukelaer, Herman; Smýkal, Petr; Davenport, Guy F; Fack, Veerle.

BMC Bioinformatics ; 13: 312, 2012 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-23174036

RESUMO

BACKGROUND: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times. RESULTS: Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC. CONCLUSION: The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn't always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.

Assuntos

Variação Genética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Software , Algoritmos , Alelos , Método de Monte Carlo

Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures.

Thachuk, Chris; Crossa, José; Franco, Jorge; Dreisigacker, Susanne; Warburton, Marilyn; Davenport, Guy F.

BMC Bioinformatics ; 10: 243, 2009 Aug 06.

Artigo em Inglês | MEDLINE | ID: mdl-19660135

RESUMO

BACKGROUND: Existing algorithms and methods for forming diverse core subsets currently address either allele representativeness (breeder's preference) or allele richness (taxonomist's preference). The main objective of this paper is to propose a powerful yet flexible algorithm capable of selecting core subsets that have high average genetic distance between accessions, or rich genetic diversity overall, or a combination of both. RESULTS: We present Core Hunter, an advanced stochastic local search algorithm for selecting core subsets. Core Hunter is able to find core subsets having more genetic diversity and better average genetic distance than the current state-of-the-art algorithms for all genetic distance and diversity measures we evaluated. Furthermore, Core Hunter can attempt to optimize any number of genetic measures simultaneously, based on the preference of the user. Notably, Core Hunter is able to select significantly smaller core subsets, which retain all unique alleles from a reference collection, than state-of-the-art algorithms. CONCLUSION: Core Hunter is a highly effective and flexible tool for sampling genetic resources and establishing core subsets. Our implementation, documentation, and source code for Core Hunter is available at http://corehunter.org.

Assuntos

Algoritmos , Técnicas Genéticas , Alelos

Generation Challenge Programme (GCP): standards for crop data.

Bruskiewich, Richard; Davenport, Guy; Hazekamp, Tom; Metz, Thomas; Ruiz, Manuel; Simon, Reinhard; Takeya, Masaru; Lee, Jennifer; Senger, Martin; McLaren, Graham; Van Hintum, Theo.

OMICS ; 10(2): 215-9, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16901229

RESUMO

The Generation Challenge Programme (GCP) is an international research consortium striving to apply molecular biological advances to crop improvement for developing countries. Central to its activities is the creation of a next generation global crop information platform and network to share genetic resources, genomics, and crop improvement information. This system is being designed based on a comprehensive scientific domain object model and associated shared ontology. This model covers germplasm, genotype, phenotype, functional genomics, and geographical information data types needed in GCP research. This paper provides an overview of this modeling effort.

Assuntos

Produtos Agrícolas/genética , Genômica/normas , Biologia Molecular/normas , Países em Desenvolvimento , Software/normas

Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature.

Shrestha, Rosemary; Arnaud, Elizabeth; Mauleon, Ramil; Senger, Martin; Davenport, Guy F; Hancock, David; Morrison, Norman; Bruskiewich, Richard; McLaren, Graham.

AoB Plants ; 2010: plq008, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-22476066

RESUMO

BACKGROUND AND AIMS: Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders. These databases provide comparative phenotypic and genotypic information that can help elucidate functional aspects of plant and agricultural biology. To facilitate data sharing within and between these databases and the retrieval of information, the crop ontology (CO) database was designed to provide controlled vocabulary sets for several economically important plant species. METHODOLOGY: Existing public ontologies and equivalent catalogues of concepts covering the range of crop science information and descriptors for crops and crop-related traits were collected from breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit tool. All terms within an ontology were assigned a globally unique CO term identifier. PRINCIPAL RESULTS: The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and maize are also included. In addition, multi-crop passport terms are included as controlled vocabularies for sharing information on germplasm. Two web-based online resources were built to make these COs available to the scientific community: the 'CO Lookup Service' for browsing the CO; and the 'Crops Terminizer', an ontology text mark-up tool. CONCLUSIONS: The controlled vocabularies of the CO are being used to curate several CGIAR centres' agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative phenotypic and genotypic studies across species and gene-discovery experiments.

The generation challenge programme platform: semantic standards and workbench for crop science.

Bruskiewich, Richard; Senger, Martin; Davenport, Guy; Ruiz, Manuel; Rouard, Mathieu; Hazekamp, Tom; Takeya, Masaru; Doi, Koji; Satoh, Kouji; Costa, Marcos; Simon, Reinhard; Balaji, Jayashree; Akintunde, Akinnola; Mauleon, Ramil; Wanchana, Samart; Shah, Trushar; Anacleto, Mylah; Portugal, Arllet; Ulat, Victor Jun; Thongjuea, Supat; Braak, Kyle; Ritter, Sebastian; Dereeper, Alexis; Skofic, Milko; Rojas, Edwin; Martins, Natalia; Pappas, Georgios; Alamban, Ryan; Almodiel, Roque; Barboza, Lord Hendrix; Detras, Jeffrey; Manansala, Kevin; Mendoza, Michael Jonathan; Morales, Jeffrey; Peralta, Barry; Valerio, Rowena; Zhang, Yi; Gregorio, Sergio; Hermocilla, Joseph; Echavez, Michael; Yap, Jan Michael; Farmer, Andrew; Schiltz, Gary; Lee, Jennifer; Casstevens, Terry; Jaiswal, Pankaj; Meintjes, Ayton; Wilkinson, Mark; Good, Benjamin; Wagner, James.

Int J Plant Genomics ; 2008: 369601, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18483570

RESUMO

The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.

GERMINATE. a generic database for integrating genotypic and phenotypic information for plant genetic resource collections.

Lee, Jennifer M; Davenport, Guy F; Marshall, David; Ellis, T H Noel; Ambrose, Michael J; Dicks, Jo; van Hintum, Theo J L; Flavell, Andrew J.

Plant Physiol ; 139(2): 619-31, 2005 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-16219922

RESUMO

The extensive germplasm resource collections that are now available for major crop plants and their wild relatives will increasingly provide valuable biological and bioinformatics resources for plant physiologists and geneticists to dissect the molecular basis of key traits and to develop highly adapted plant material to sustain future breeding programs. A key to the efficient deployment of these resources is the development of information systems that will enable the collection and storage of biological information for these plant lines to be integrated with the molecular information that is now becoming available through the use of high-throughput genomics and post-genomics technologies. The GERMINATE database has been designed to hold a diverse variety of data types, ranging from molecular to phenotypic, and to allow querying between such data for any plant species. Data are stored in GERMINATE in a technology-independent manner, such that new technologies can be accommodated in the database as they emerge, without modification of the underlying schema. Users can access data in GERMINATE databases either via a lightweight Perl-CGI Web interface or by the more complex Genomic Diversity and Phenotype Connection software. GERMINATE is released under the GNU General Public License and is available at http://germinate.scri.sari.ac.uk/germinate/.

Assuntos

Bases de Dados Genéticas , Plantas/genética , Biologia Computacional , Genótipo , Fenótipo , Design de Software , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA