Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks.

Orozco-Arias, Simon; Lopez-Murillo, Luis Humberto; Piña, Johan S; Valencia-Castrillon, Estiven; Tabares-Soto, Reinel; Castillo-Ossa, Luis; Isaza, Gustavo; Guyot, Romain.

PLoS One ; 18(9): e0291925, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37733731

RESUMO

Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.

Assuntos

Elementos de DNA Transponíveis , Genômica , Elementos de DNA Transponíveis/genética , Benchmarking , Eucariotos , Redes Neurais de Computação

Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning.

Orozco-Arias, Simon; Candamil-Cortes, Mariana S; Jaimes, Paula A; Valencia-Castrillon, Estiven; Tabares-Soto, Reinel; Isaza, Gustavo; Guyot, Romain.

J Integr Bioinform ; 19(3)2022 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-35822734

RESUMO

Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.

Assuntos

Retroelementos , Sequências Repetidas Terminais , Elementos de DNA Transponíveis , Evolução Molecular , Genoma de Planta , Aprendizado de Máquina , Plantas/genética , Retroelementos/genética

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA