Search | VHL Regional Portal

A joint learning approach for genomic prediction in polyploid grasses.

Aono, Alexandre Hild; Ferreira, Rebecca Caroline Ulbricht; Moraes, Aline da Costa Lima; Lara, Letícia Aparecida de Castro; Pimenta, Ricardo José Gonzaga; Costa, Estela Araujo; Pinto, Luciana Rossini; Landell, Marcos Guimarães de Andrade; Santos, Mateus Figueiredo; Jank, Liana; Barrios, Sanzio Carvalho Lima; do Valle, Cacilda Borges; Chiari, Lucimara; Garcia, Antonio Augusto Franco; Kuroshu, Reginaldo Massanobu; Lorena, Ana Carolina; Gorjanc, Gregor; de Souza, Anete Pereira.

Sci Rep ; 12(1): 12499, 2022 07 21.

Article in English | MEDLINE | ID: mdl-35864135

ABSTRACT

Poaceae, among the most abundant plant families, includes many economically important polyploid species, such as forage grasses and sugarcane (Saccharum spp.). These species have elevated genomic complexities and limited genetic resources, hindering the application of marker-assisted selection strategies. Currently, the most promising approach for increasing genetic gains in plant breeding is genomic selection. However, due to the polyploidy nature of these polyploid species, more accurate models for incorporating genomic selection into breeding schemes are needed. This study aims to develop a machine learning method by using a joint learning approach to predict complex traits from genotypic data. Biparental populations of sugarcane and two species of forage grasses (Urochloa decumbens, Megathyrsus maximus) were genotyped, and several quantitative traits were measured. High-quality markers were used to predict several traits in different cross-validation scenarios. By combining classification and regression strategies, we developed a predictive system with promising results. Compared with traditional genomic prediction methods, the proposed strategy achieved accuracy improvements exceeding 50%. Our results suggest that the developed methodology could be implemented in breeding programs, helping reduce breeding cycles and increase genetic gains.

Subject(s)

Poaceae , Saccharum , Genomics/methods , Phenotype , Plant Breeding , Poaceae/genetics , Polyploidy , Saccharum/genetics

Relating instance hardness to classification performance in a dataset: a visual approach.

Paiva, Pedro Yuri Arbs; Moreno, Camila Castro; Smith-Miles, Kate; Valeriano, Maria Gabriela; Lorena, Ana Carolina.

Mach Learn ; 111(8): 3085-3123, 2022.

Article in English | MEDLINE | ID: mdl-35761958

ABSTRACT

Machine Learning studies often involve a series of computational experiments in which the predictive performance of multiple models are compared across one or more datasets. The results obtained are usually summarized through average statistics, either in numeric tables or simple plots. Such approaches fail to reveal interesting subtleties about algorithmic performance, including which observations an algorithm may find easy or hard to classify, and also which observations within a dataset may present unique challenges. Recently, a methodology known as Instance Space Analysis was proposed for visualizing algorithm performance across different datasets. This methodology relates predictive performance to estimated instance hardness measures extracted from the datasets. However, the analysis considered an instance as being an entire classification dataset and the algorithm performance was reported for each dataset as an average error across all observations in the dataset. In this paper, we developed a more fine-grained analysis by adapting the ISA methodology. The adapted version of ISA allows the analysis of an individual classification dataset by a 2-D hardness embedding, which provides a visualization of the data according to the difficulty level of its individual observations. This allows deeper analyses of the relationships between instance hardness and predictive performance of classifiers. We also provide an open-access Python package named PyHard, which encapsulates the adapted ISA and provides an interactive visualization interface. We illustrate through case studies how our tool can provide insights about data quality and algorithm performance in the presence of challenges such as noisy and biased data.

Protein cellular localization prediction with Support Vector Machines and Decision Trees.

Lorena, Ana Carolina; de Carvalho, André C P L F.

Comput Biol Med ; 37(2): 115-25, 2007 Feb.

Article in English | MEDLINE | ID: mdl-16574093

ABSTRACT

Many cellular functions are carried out in specific compartments of the cell. The prediction of the cellular localization of a protein is thus related to its function identification. This paper uses two Machine Learning techniques, Support Vector Machines (SVMs) and Decision Trees, in the prediction of the localization of proteins from three categories of organisms: gram-positive and gram-negative bacteria and fungi. For all categories considered, the localization task has multiple classes, which correspond to the possible protein locations. Since SVMs are originally designed for the solution of two-class problems, this paper also investigates and compares several strategies to extend this technique to perform multiclass predictions.

Subject(s)

Bacterial Proteins/metabolism , Decision Trees , Fungal Proteins/metabolism

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL