Búsqueda | Portal de Búsqueda de la BVS Colombia

Graph-based self-supervised learning for repeat detection in metagenomic assembly.

Azizpour, Ali; Balaji, Advait; Treangen, Todd J; Segarra, Santiago.

Genome Res ; 2024 Jul 19.

Artículo en Inglés | MEDLINE | ID: mdl-39029947

RESUMEN

Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges. To address this issue, we propose GraSSRep, a novel approach that leverages the assembly graph's structure through graph neural networks (GNNs) within a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories. Specifically, we frame this problem as a node classification task within a metagenomic assembly graph. In a self-supervised fashion, we rely on a high-precision (but low-recall) heuristic to generate pseudo-labels for a small proportion of the nodes. We then use those pseudo-labels to train a GNN embedding and a random forest classifier to propagate the labels to the remaining nodes. In this way, GraSSRep combines sequencing features with predefined and learned graph features to achieve state-of-the-art performance in repeat detection. We evaluate our method using simulated and synthetic metagenomic datasets. The results on the simulated data highlight our GraSSRep's robustness to repeat attributes, demonstrating its effectiveness in handling the complexity of repeated sequences. Additionally, our experiments with synthetic metagenomic datasets reveal that incorporating the graph structure and the GNN enhances our detection performance. Finally, in comparative analyses, GraSSRep outperforms existing repeat detection tools with respect to precision and recall.

Simulation of time-series groundwater parameters using a hybrid metaheuristic neuro-fuzzy model.

Azizpour, Ali; Izadbakhsh, Mohammad Ali; Shabanlou, Saeid; Yosefvand, Fariborz; Rajabi, Ahmad.

Environ Sci Pollut Res Int ; 29(19): 28414-28430, 2022 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-34988802

RESUMEN

The estimation of qualitative and quantitative groundwater parameters is an essential task. In this regard, artificial intelligence (AI) techniques are extensively utilized as accurate, trustworthy, and cost-effective tools. In the present paper, two hybrid neuro-fuzzy models are implemented for the prediction of groundwater level (GWL) fluctuations, as well as variations of Cl - and HCO3 - in the Karnachi well, Kermanshah, Iran in monthly intervals within a 13-year period from 2005 to 2018. In order to develop AI models, the adaptive neuro-fuzzy inference system (ANFIS), firefly algorithm (FA), and wavelet transform (WT) are used. In other words, two hybrid models including ANFIS-FA (adaptive neuro-fuzzy inference system-firefly algorithm) and WANFIS-FA (wavelet-adaptive neuro-fuzzy inference system-firefly algorithm) are utilized for the estimation of the quantitative and qualitative parameters. Firstly, influencing lags of the time-series of the qualitative and quantitative parameters are identified using the autocorrelation function. Then, four and eight separate models are developed for the approximation of GWLs and qualitative parameters (i.e. Cl - and HCO3 -), respectively. It is worth to mention that about 75% of observed values are assigned to train the hybrid AI models, while the rest (i.e. 25%) to test them. Sensitivity analysis results reveal that the WANFIS-FA models display more acceptable performance than the ANFIS-FA ones. Also, the estimations of MAE, NSC, and SI for the simulation of HCO3 - by the superior model of the WANFIS-FA are obtained to be 0.040, 0.988, and 0.022, respectively. In addition, the lags (t-1), (t-2), (t-3), and (t-4) are ascertained as the most effective time-series lags for the estimation of Cl - .

Asunto(s)

Lógica Difusa , Agua Subterránea , Algoritmos , Inteligencia Artificial , Redes Neurales de la Computación

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA