Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Machine learning applications on intratumoral heterogeneity in glioblastoma using single-cell RNA sequencing data.

Arteaga-Arteaga, Harold Brayan; Candamil-Cortés, Mariana S; Breaux, Brian; Guillen-Rondon, Pablo; Orozco-Arias, Simon; Tabares-Soto, Reinel.

Brief Funct Genomics ; 22(5): 428-441, 2023 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-37119295

RESUMO

Artificial intelligence is revolutionizing all fields that affect people's lives and health. One of the most critical applications is in the study of tumors. It is the case of glioblastoma (GBM) that has behaviors that need to be understood to develop effective therapies. Due to advances in single-cell RNA sequencing (scRNA-seq), it is possible to understand the cellular and molecular heterogeneity in the GBM. Given that there are different cell groups in these tumors, there is a need to apply Machine Learning (ML) algorithms. It will allow extracting information to understand how cancer changes and broaden the search for effective treatments. We proposed multiple comparisons of ML algorithms to classify cell groups based on the GBM scRNA-seq data. This broad comparison spectrum can show the scientific-medical community which models can achieve the best performance in this task. In this work are classified the following cell groups: Tumor Core (TC), Tumor Periphery (TP) and Normal Periphery (NP), in binary and multi-class scenarios. This work presents the biomarker candidates found for the models with the best results. The analyses presented here allow us to verify the biomarker candidates to understand the genetic characteristics of GBM, which may be affected by a suitable identification of GBM heterogeneity. This work obtained for the four scenarios covered cross-validation results of $93.03\% \pm 5.37\%$, $97.42\% \pm 3.94\%$, $98.27\% \pm 1.81\%$ and $93.04\% \pm 6.88\%$ for the classification of TP versus TC, TP versus NP, NP versus TP and TC (TPC) and NP versus TP versus TC, respectively.

Assuntos

Glioblastoma , Humanos , Glioblastoma/genética , Glioblastoma/patologia , Inteligência Artificial , Biomarcadores , Aprendizado de Máquina , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos

Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.

Orozco-Arias, Simon; Humberto Lopez-Murillo, Luis; Candamil-Cortés, Mariana S; Arias, Maradey; Jaimes, Paula A; Rossi Paschoal, Alexandre; Tabares-Soto, Reinel; Isaza, Gustavo; Guyot, Romain.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36502372

RESUMO

LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.

Assuntos

Aprendizado Profundo , Retroelementos , Retroelementos/genética , Sequências Repetidas Terminais/genética , Genoma de Planta , Software , Evolução Molecular , Filogenia

Analysis of fruit ripening in Theobroma cacao pod husk based on untargeted metabolomics.

Gallego, Adriana M; Zambrano, Romer A; Zuluaga, Martha; Camargo Rodríguez, Anyela V; Candamil Cortés, Mariana S; Romero Vergel, Angela P; Arboleda Valencia, Jorge W.

Phytochemistry ; 203: 113412, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36055428

RESUMO

The pod husk of Theobroma cacao (CPH) plays an important agronomical role, as its appearance is used as indicator of ripening, guiding the farmers in the harvest process. Cacao harvesting is not a standardized practice because farmers harvest between six up to eight months from flowering, guided by pod's color and shape. The mixture of cacao beans from different ripening stages (RS), negatively affecting the quality and price of grain. A way to help the farmers in the harvest standardization could be through the use of chemical markers and visual indicators of CPH ripening. This study analyses CPH's metabolic distribution of two cacao clones, ICS95 and CCN51 at six, seven, and eight months of ripening. Untargeted metabolomics was done using HPLC-MS/MS for biomarker discovery and association to cacao ripening. The results indicated a strong metabolic differentiation of the sixth month with the rest of the months independent of the variety. Also, metabolic differences were found between cacao clones for the seventh and eighth month. We annotated five potential biochemical markers including 3-caffeoylpelargodinin 5-glucoside, indoleacetaldehyde, procyanidin A dimer, procyanidin C1, and kaempferol. We further looked for correlation between patterns of progression of our markers against quantitative indicators of CPH appearance and texture, at the same ripening stages. We also performed a functional analysis and three possible metabolic pathways: flavone and flavonol biosynthesis, flavonoid biosynthesis, and tryptophan metabolism were identified associated with stress sensing, plant development and defense respectively. We found significant and positive correlations between green color density and all metabolites. For texture, the correlations were significantly negative with all metabolites. Our results suggest that about the sixth month is appropriate for harvesting cacao in the region of Caldas, Colombia in order to avoid all the metabolic variations occurring at later stages of ripening which impact the cacao bean quality. Therefore, studying the cacao ripening process can help in the estimation of the best harvest time and contribute to the standardization of harvest practices.

Assuntos

Cacau , Flavonas , Proantocianidinas , Cacau/metabolismo , Flavonas/metabolismo , Flavonoides/metabolismo , Frutas/metabolismo , Glucosídeos/metabolismo , Quempferóis/metabolismo , Metabolômica , Espectrometria de Massas em Tandem , Triptofano

Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.

Orozco-Arias, Simon; Candamil-Cortes, Mariana S; Jaimes, Paula A; Valencia-Castrillon, Estiven; Tabares-Soto, Reinel; Isaza, Gustavo; Guyot, Romain.

J Integr Bioinform ; 19(3)2022 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-35822734

RESUMO

Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.

Assuntos

Retroelementos , Sequências Repetidas Terminais , Elementos de DNA Transponíveis , Evolução Molecular , Genoma de Planta , Aprendizado de Máquina , Plantas/genética , Retroelementos/genética

K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes.

Orozco-Arias, Simon; Candamil-Cortés, Mariana S; Jaimes, Paula A; Piña, Johan S; Tabares-Soto, Reinel; Guyot, Romain; Isaza, Gustavo.

PeerJ ; 9: e11456, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34055489

RESUMO

Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA