Pesquisa | Biblioteca Virtual em Saúde

Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity.

Torres-Martos, Álvaro; Bustos-Aibar, Mireia; Ramírez-Mena, Alberto; Cámara-Sánchez, Sofía; Anguita-Ruiz, Augusto; Alcalá, Rafael; Aguilera, Concepción M; Alcalá-Fdez, Jesús.

Genes (Basel) ; 14(2)2023 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-36833178

RESUMO

The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools are subject to the proper application of algorithms as well as the appropriate pre-processing and management of input omics and molecular data. Currently, many of the available approaches that use machine learning on omics data for predictive purposes make mistakes in several of the following key steps: experimental design, feature selection, data pre-processing, and algorithm selection. For this reason, we propose the current work as a guideline on how to confront the main challenges inherent to multi-omics human data. As such, a series of best practices and recommendations are also presented for each of the steps defined. In particular, the main particularities of each omics data layer, the most suitable preprocessing approaches for each source, and a compilation of best practices and tips for the study of disease development prediction using machine learning are described. Using examples of real data, we show how to address the key problems mentioned in multi-omics research (e.g., biological heterogeneity, technical noise, high dimensionality, presence of missing values, and class imbalance). Finally, we define the proposals for model improvement based on the results found, which serve as the bases for future work.

Assuntos

Obesidade Infantil , Criança , Humanos , Aprendizado de Máquina , Algoritmos

Explainable artificial intelligence to predict and identify prostate cancer tissue by gene expression.

Ramírez-Mena, Alberto; Andrés-León, Eduardo; Alvarez-Cubero, Maria Jesus; Anguita-Ruiz, Augusto; Martinez-Gonzalez, Luis Javier; Alcala-Fdez, Jesus.

Comput Methods Programs Biomed ; 240: 107719, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37453366

RESUMO

BACKGROUND AND OBJECTIVE: Prostate cancer is one of the most prevalent forms of cancer in men worldwide. Traditional screening strategies such as serum PSA levels, which are not necessarily cancer-specific, or digital rectal exams, which are often inconclusive, are still the screening methods used for the disease. Some studies have focused on identifying biomarkers of the disease but none have been reported for diagnosis in routine clinical practice and few studies have provided tools to assist the pathologist in the decision-making process when analyzing prostate tissue. Therefore, a classifier is proposed to predict the occurrence of PCa that provides physicians with accurate predictions and understandable explanations. METHODS: A selection of 47 genes was made based on differential expression between PCa and normal tissue, GO gene ontology as well as the literature to be used as input predictors for different machine learning methods based on eXplainable Artificial Intelligence. These methods were trained using different class-balancing strategies to build accurate classifiers using gene expression data from 550 samples from 'The Cancer Genome Atlas'. Our model was validated in four external cohorts with different ancestries, totaling 463 samples. In addition, a set of SHapley Additive exPlanations was provided to help clinicians understand the underlying reasons for each decision. RESULTS: An in-depth analysis showed that the Random Forest algorithm combined with majority class downsampling was the best performing approach with robust statistical significance. Our method achieved an average sensitivity and specificity of 0.90 and 0.8 with an AUC of 0.84 across all databases. The relevance of DLX1, MYL9 and FGFR genes for PCa screening was demonstrated in addition to the important role of novel genes such as CAV2 and MYLK. CONCLUSIONS: This model has shown good performance in 4 independent external cohorts of different ancestries and the explanations provided are consistent with each other and with the literature, opening a horizon for its application in clinical practice. In the near future, these genes, in combination with our model, could be applied to liquid biopsy to improve PCa screening.

Assuntos

Inteligência Artificial , Neoplasias da Próstata , Masculino , Humanos , Neoplasias da Próstata/genética , Sensibilidade e Especificidade , Expressão Gênica

Functional Enrichment Analysis of Regulatory Elements.

Garcia-Moreno, Adrian; López-Domínguez, Raul; Villatoro-García, Juan Antonio; Ramirez-Mena, Alberto; Aparicio-Puerta, Ernesto; Hackenberg, Michael; Pascual-Montano, Alberto; Carmona-Saez, Pedro.

Biomedicines ; 10(3)2022 Mar 03.

Artigo em Inglês | MEDLINE | ID: mdl-35327392

RESUMO

Statistical methods for enrichment analysis are important tools to extract biological information from omics experiments. Although these methods have been widely used for the analysis of gene and protein lists, the development of high-throughput technologies for regulatory elements demands dedicated statistical and bioinformatics tools. Here, we present a set of enrichment analysis methods for regulatory elements, including CpG sites, miRNAs, and transcription factors. Statistical significance is determined via a power weighting function for target genes and tested by the Wallenius noncentral hypergeometric distribution model to avoid selection bias. These new methodologies have been applied to the analysis of a set of miRNAs associated with arrhythmia, showing the potential of this tool to extract biological information from a list of regulatory elements. These new methods are available in GeneCodis 4, a web tool able to perform singular and modular enrichment analysis that allows the integration of heterogeneous information.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA