Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 171
Filtrar
1.
Sci Rep ; 14(1): 18244, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39107557

RESUMO

Accurately predicting the Modulus of Resilience (MR) of subgrade soils, which exhibit non-linear stress-strain behaviors, is crucial for effective soil assessment. Traditional laboratory techniques for determining MR are often costly and time-consuming. This study explores the efficacy of Genetic Programming (GEP), Multi-Expression Programming (MEP), and Artificial Neural Networks (ANN) in forecasting MR using 2813 data records while considering six key parameters. Several Statistical assessments were utilized to evaluate model accuracy. The results indicate that the GEP model consistently outperforms MEP and ANN models, demonstrating the lowest error metrics and highest correlation indices (R2). During training, the GEP model achieved an R2 value of 0.996, surpassing the MEP (R2 = 0.97) and ANN (R2 = 0.95) models. Sensitivity and SHAP (SHapley Additive exPlanations) analysis were also performed to gain insights into input parameter significance. Sensitivity analysis revealed that confining stress (21.6%) and dry density (26.89%) are the most influential parameters in predicting MR. SHAP analysis corroborated these findings, highlighting the critical impact of these parameters on model predictions. This study underscores the reliability of GEP as a robust tool for precise MR prediction in subgrade soil applications, providing valuable insights into model performance and parameter significance across various machine-learning (ML) approaches.

2.
Sci Rep ; 14(1): 18145, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39103567

RESUMO

Bentonite plastic concrete (BPC) is extensively used in the construction of water-tight structures like cut-off walls in dams, etc., because it offers high plasticity, improved workability, and homogeneity. Also, bentonite is added to concrete mixes for the adsorption of toxic metals. The modified design of BPC, as compared to normal concrete, requires a reliable tool to predict its strength. Thus, this study presents a novel attempt at the application of two innovative evolutionary techniques known as multi-expression programming (MEP) and gene expression programming (GEP) and a boosting-based algorithm known as AdaBoost to predict the 28-day compressive strength ( ) of BPC based on its mixture composition. The MEP and GEP algorithms expressed their outputs in the form of an empirical equation, while AdaBoost failed to do so. The algorithms were trained using a dataset of 246 points gathered from published literature having six important input factors for predicting. The developed models were subject to error evaluation, and the results revealed that all algorithms satisfied the suggested criteria and had a correlation coefficient (R) greater than 0.9 for both the training and testing phases. However, AdaBoost surpassed both MEP and GEP in terms of accuracy and demonstrated a lower testing RMSE of 1.66 compared to 2.02 for MEP and 2.38 for GEP. Similarly, the objective function value for AdaBoost was 0.10 compared to 0.176 for GEP and 0.16 for MEP, which indicated the overall good performance of AdaBoost compared to the two evolutionary techniques. Also, Shapley additive analysis was done on the AdaBoost model to gain further insights into the prediction process, which revealed that cement, coarse aggregate, and fine aggregate are the most important factors in predicting the strength of BPC. Moreover, an interactive graphical user interface (GUI) has been developed to be practically utilized in the civil engineering industry for prediction of BPC strength.

3.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39129360

RESUMO

The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins-the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared with the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." A major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago or have never evolved (yet). By merging evolutionary algorithms, machine learning, and bioinformatics, we can develop highly customized "designer proteins." We dub the new subfield of computational evolution, which employs evolutionary algorithms with DNA string representations, biologically accurate molecular evolution, and bioinformatics-informed fitness functions, Evolutionary Algorithms Simulating Molecular Evolution.


Assuntos
Algoritmos , Biologia Computacional , Evolução Molecular , Biologia Computacional/métodos , Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Simulação por Computador
4.
Sci Rep ; 14(1): 17293, 2024 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-39068262

RESUMO

The utilization of Self-compacting Concrete (SCC) has escalated worldwide due to its superior properties in comparison to normal concrete such as compaction without vibration, increased flowability and segregation resistance. Various other desirable properties like ductile behaviour, increased strain capacity and tensile strength etc. can be imparted to SCC by incorporation of fibres. Thus, this study presents a novel approach to predict 28-day compressive strength (C-S) of FR-SCC using Gene Expression Programming (GEP) and Multi Expression Programming (MEP) for fostering its widespread use in the industry. For this purpose, a dataset had been compiled from internationally published literature having six input parameters including water-to-cement ratio, silica fume, fine aggregate, coarse aggregate, fibre, and superplasticizer. The predictive abilities of developed algorithms were assessed using error metrices like mean absolute error (MAE), a20-index, and objective function (OF) etc. The comparison of MEP and GEP models indicated that GEP gave a simple equation having lesser errors than MEP. The OF value of GEP was 0.029 compared to 0.031 of MEP. Thus, sensitivity analysis was performed on GEP model. The models were also checked using some external validation checks which also verified that MEP and GEP equations can be used to forecast the strength of FR-SCC for practical uses.

5.
Genome Biol ; 25(1): 201, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080715

RESUMO

BACKGROUND: North African human populations present a complex demographic scenario due to the presence of an autochthonous genetic component and population substructure, plus extensive gene flow from the Middle East, Europe, and sub-Saharan Africa. RESULTS: We conducted a comprehensive analysis of 364 genomes to construct detailed demographic models for the North African region, encompassing its two primary ethnic groups, the Arab and Amazigh populations. This was achieved through an Approximate Bayesian Computation with Deep Learning (ABC-DL) framework and a novel algorithm called Genetic Programming for Population Genetics (GP4PG). This innovative approach enabled us to effectively model intricate demographic scenarios, utilizing a subset of 16 whole genomes at > 30X coverage. The demographic model suggested by GP4PG exhibited a closer alignment with the observed data compared to the ABC-DL model. Both point to a back-to-Africa origin of North African individuals and a close relationship with Eurasian populations. Results support different origins for Amazigh and Arab populations, with Amazigh populations originating back in Epipaleolithic times, while GP4PG supports Arabization as the main source of Middle Eastern ancestry. The GP4PG model includes population substructure in surrounding populations (sub-Saharan Africa and Middle East) with continuous decaying gene flow after population split. Contrary to ABC-DL, the best GP4PG model does not require pulses of admixture from surrounding populations into North Africa pointing to soft splits as drivers of divergence in North Africa. CONCLUSIONS: We have built a demographic model on North Africa that points to a back-to-Africa expansion and a differential origin between Arab and Amazigh populations.


Assuntos
Genética Populacional , Genoma Humano , Humanos , África do Norte , População Negra/genética , Modelos Genéticos , Fluxo Gênico , Teorema de Bayes , Oriente Médio , Árabes/genética , Algoritmos , População do Norte da África
6.
Sci Rep ; 14(1): 12666, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38831089

RESUMO

In the paper, a new evolutionary technique called Linear Matrix Genetic Programming (LMGP) is proposed. It is a matrix extension of Linear Genetic Programming and its application is data-driven black-box control-oriented modeling in conditions of limited access to training data. In LMGP, the model is in the form of an evolutionarily-shaped program which is a sequence of matrix operations. Since the program has a hidden state, running it for a sequence of input data has a similar effect to using well-known recurrent neural networks such as Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). To verify the effectiveness of the LMGP, it was compared with different types of neural networks. The task of all the compared techniques was to reproduce the behavior of a nonlinear model of an underwater vehicle. The results of the comparative tests are reported in the paper and they show that the LMGP can quickly find an effective and very simple solution to the given problem. Moreover, a detailed comparison of models, generated by LMGP and LSTM/GRU, revealed that the former are up to four times more accurate than the latter in reproducing vehicle behavior.

7.
Sci Rep ; 14(1): 13254, 2024 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-38858366

RESUMO

Bitumen, aggregate, and air void (VA) are the three primary ingredients of asphalt concrete. VA changes over time as a function of four factors: traffic loads and repetitions, environmental regimes, compaction, and asphalt mix composition. Due to the high as-constructed VA content of the material, it is expected that VA will reduce over time, causing rutting during initial traffic periods. Eventually, the material will undergo shear flow when it reaches its densest state with optimum aggregate interlock or refusal VA content. Therefore, to ensure the quality of construction, VA in asphalt mixture need to be modeled throughout the service life. This study aims to implement a hybrid evolutionary polynomial regression (EPR) combined with a teaching-learning based optimization (TLBO) algorithm and multi-gene genetic programming (MGGP) to predict the VA percentage of asphalt mixture during the service life. For this purpose, 324 data records of VA were collected from the literature. The variables selected as inputs were original as-constructed VA, VA orig (%); mean annual air temperature, MAAT (°F); original viscosity at 77 °F, η o r i g , 77 (Mega-Poises); and time (months). EPR-TLBO was found to be superior to MGGP and existing empirical models due to the interquartile ranges of absolute error boxes equal to 0.67%. EPR-TLBO had an R2 value of more than 0.90 in both the training and testing phases, and only less than 20% of the records were predicted utilizing this model with more than 20% deviation from the observed values. As determined by the sensitivity analysis, η o r i g , 77 is the most significant of the four input variables, while time is the least one. A parametric study showed that regardless of MAAT , η o r i g , 77 , of 0.3 Mega-Poises, and VA orig above 6% can be ideal for improving the pavement service life. It was also witnessed that with an increase of MAAT from 37 to 75 °F, the serviceability of asphalt concrete takes 15 months less on average.


Assuntos
Materiais de Construção , Hidrocarbonetos , Algoritmos
8.
J Comput Aided Mol Des ; 38(1): 17, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570405

RESUMO

The development of peptides for therapeutic targets or biomarkers for disease diagnosis is a challenging task in protein engineering. Current approaches are tedious, often time-consuming and require complex laboratory data due to the vast search spaces that need to be considered. In silico methods can accelerate research and substantially reduce costs. Evolutionary algorithms are a promising approach for exploring large search spaces and can facilitate the discovery of new peptides. This study presents the development and use of a new variant of the genetic-programming-based POET algorithm, called POET Regex , where individuals are represented by a list of regular expressions. This algorithm was trained on a small curated dataset and employed to generate new peptides improving the sensitivity of peptides in magnetic resonance imaging with chemical exchange saturation transfer (CEST). The resulting model achieves a performance gain of 20% over the initial POET models and is able to predict a candidate peptide with a 58% performance increase compared to the gold-standard peptide. By combining the power of genetic programming with the flexibility of regular expressions, new peptide targets were identified that improve the sensitivity of detection by CEST. This approach provides a promising research direction for the efficient identification of peptides with therapeutic or diagnostic potential.


Assuntos
Algoritmos , Imageamento por Ressonância Magnética , Humanos , Imagens de Fantasmas , Imageamento por Ressonância Magnética/métodos , Peptídeos
9.
J Environ Manage ; 356: 120510, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38490009

RESUMO

Continuous effluent quality prediction in wastewater treatment processes is crucial to proactively reduce the risks to the environment and human health. However, wastewater treatment is an extremely complex process controlled by several uncertain, interdependent, and sometimes poorly characterized physico-chemical-biological process parameters. In addition, there are substantial spatiotemporal variations, uncertainties, and high non-linear interactions among the water quality parameters and process variables involved in the treatment process. Such complexities hinder efficient monitoring, operation, and management of wastewater treatment plants under normal and abnormal conditions. Typical mathematical and statistical tools most often fail to capture such complex interrelationships, and therefore data-driven techniques offer an attractive solution to effectively quantify the performance of wastewater treatment plants. Although several previous studies focused on applying regression-based data-driven models (e.g., artificial neural network) to predict some wastewater treatment effluent parameters, most of these studies employed a limited number of input variables to predict only one or two parameters characterizing the effluent quality (e.g., chemical oxygen demand (COD) and/or suspended solids (SS)). Harnessing the power of Artificial Intelligence (AI), the current study proposes multi-gene genetic programming (MGGP)-based models, using a dataset obtained from an operational wastewater treatment plant, deploying membrane aerated biofilm reactor, to predict the filtrated COD, ammonia (NH4), and SS concentrations along with the carbon-to-nitrogen ratio (C/N) within the effluent. Input features included a set of process variables characterizing the influent quality (e.g., filtered COD, NH4, and SS concentrations), water physics and chemistry parameters (e.g., temperature and pH), and operation conditions (e.g., applied air pressure). The developed MGGP-based models accurately reproduced the observations of the four output variables with correlation coefficient values that ranged between 0.98 and 0.99 during training and between 0.96 and 0.99 during testing, reflecting the power of the developed models in predicting the quality of the effluent from the treatment system. Interpretability analyses were subsequently deployed to confirm the intuitive understanding of input-output interrelations and to identify the governing parameters of the treatment process. The developed MGGP-based models can facilitate the AI-driven monitoring and management of wastewater treatment plants through devising optimal rapid operation and control schemes and assisting the plants' operators in maintaining proper performance of the plants under various normal and disruptive operational conditions.


Assuntos
Inteligência Artificial , Purificação da Água , Humanos , Eliminação de Resíduos Líquidos/métodos , Purificação da Água/métodos , Redes Neurais de Computação , Análise da Demanda Biológica de Oxigênio
10.
Sensors (Basel) ; 24(5)2024 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-38475208

RESUMO

The escalating reliance of modern society on information and communication technology has rendered it vulnerable to an array of cyber-attacks, with distributed denial-of-service (DDoS) attacks emerging as one of the most prevalent threats. This paper delves into the intricacies of DDoS attacks, which exploit compromised machines numbering in the thousands to disrupt data services and online commercial platforms, resulting in significant downtime and financial losses. Recognizing the gravity of this issue, various detection techniques have been explored, yet the quantity and prior detection of DDoS attacks has seen a decline in recent methods. This research introduces an innovative approach by integrating evolutionary optimization algorithms and machine learning techniques. Specifically, the study proposes XGB-GA Optimization, RF-GA Optimization, and SVM-GA Optimization methods, employing Evolutionary Algorithms (EAs) Optimization with Tree-based Pipelines Optimization Tool (TPOT)-Genetic Programming. Datasets pertaining to DDoS attacks were utilized to train machine learning models based on XGB, RF, and SVM algorithms, and 10-fold cross-validation was employed. The models were further optimized using EAs, achieving remarkable accuracy scores: 99.99% with the XGB-GA method, 99.50% with RF-GA, and 99.99% with SVM-GA. Furthermore, the study employed TPOT to identify the optimal algorithm for constructing a machine learning model, with the genetic algorithm pinpointing XGB-GA as the most effective choice. This research significantly advances the field of DDoS attack detection by presenting a robust and accurate methodology, thereby enhancing the cybersecurity landscape and fortifying digital infrastructures against these pervasive threats.

11.
Biosystems ; 236: 105126, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38278505

RESUMO

The inference of gene regulatory networks (GRNs) is a widely addressed problem in Systems Biology. GRNs can be modeled as Boolean networks, which is the simplest approach for this task. However, Boolean models need binarized data. Several approaches have been developed for the discretization of gene expression data (GED). Also, the advance of data extraction technologies, such as single-cell RNA-Sequencing (scRNA-Seq), provides a new vision of gene expression and brings new challenges for dealing with its specificities, such as a large occurrence of zero data. This work proposes a new discretization approach for dealing with scRNA-Seq time-series data, named Distribution and Successive Spline Points Discretization (DSSPD), which considers the data distribution and a proper preprocessing step. Here, Cartesian Genetic Programming (CGP) is used to infer GRNs using the results of DSSPD. The proposal is compared with CGP with the standard data handling and five state-of-the-art algorithms on curated models and experimental data. The results show that the proposal improves the results of CGP in all tested cases and outperforms the state-of-the-art algorithms in most cases.


Assuntos
Redes Reguladoras de Genes , Análise da Expressão Gênica de Célula Única , Compostos de Tosil , Redes Reguladoras de Genes/genética , Algoritmos , Biologia de Sistemas , Perfilação da Expressão Gênica/métodos
12.
Evol Comput ; : 1-32, 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38271633

RESUMO

Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.

13.
Evol Comput ; : 1-30, 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38271634

RESUMO

Genetic variation operators in grammar-guided genetic programming are fundamental to guide the evolutionary process in search and optimization problems. However, they show some limitations, mainly derived from an unbalanced exploration and local-search trade-off. This article presents an estimation of distribution algorithm for grammar-guided genetic programming to overcome this difficulty and thus increase the performance of the evolutionary algorithm. Our proposal employs an extended dynamic stochastic context-free grammar to encode and calculate the estimation of the distribution of the search space from some promising individuals in the population. Unlike traditional estimation of distribution algorithms, the proposed approach improves exploratory behavior by smoothing the estimated distribution model. Therefore, this algorithm is referred to as SEDA, smoothed estimation of distribution algorithm. Experiments have been conducted to compare overall performance using a typical genetic programming crossover operator, an incremental estimation of distribution algorithm, and the proposed approach after tuning their hyperparameters. These experiments involve challenging problems to test the local search and exploration features of the three evolutionary systems. The results show that grammar-guided genetic programming with SEDA achieves the most accurate solutions with an intermediate convergence speed.

14.
Evol Comput ; 32(1): 49-68, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36893327

RESUMO

Reproducibility is important for having confidence in evolutionary machine learning algorithms. Although the focus of reproducibility is usually to recreate an aggregate prediction error score using fixed random seeds, this is not sufficient. Firstly, multiple runs of an algorithm, without a fixed random seed, should ideally return statistically equivalent results. Secondly, it should be confirmed whether the expected behaviour of an algorithm matches its actual behaviour, in terms of how an algorithm targets a reduction in prediction error. Confirming the behaviour of an algorithm is not possible when using a total error aggregate score. Using an error decomposition framework as a methodology for improving the reproducibility of results in evolutionary computation addresses both of these factors. By estimating decomposed error using multiple runs of an algorithm and multiple training sets, the framework provides a greater degree of certainty about the prediction error. Also, decomposing error into bias, variance due to the algorithm (internal variance), and variance due to the training data (external variance) more fully characterises evolutionary algorithms. This allows the behaviour of an algorithm to be confirmed. Applying the framework to a number of evolutionary algorithms shows that their expected behaviour can be different to their actual behaviour. Identifying a behaviour mismatch is important in terms of understanding how to further refine an algorithm as well as how to effectively apply an algorithm to a problem.


Assuntos
Algoritmos , Aprendizado de Máquina , Reprodutibilidade dos Testes
15.
Methods Mol Biol ; 2745: 121-134, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38060183

RESUMO

Not unlike the climate or what holds the galaxies and planetary motions together, cancer biology has an intrinsic nonlinear dynamic. In this overview we will outline how to connect temporal measurements of a nonlinear dynamical and unstable complex system, such as cancer, with well-established engineering methods, old and new, that are applied in linear dynamical systems.This proof-of-concept is therapeutically relevant in the development of new means to treat or control human cancer by either adding an appropriate external "damping" or a "forcing" term, or by a "control" actuator such that its nonlinear dynamic is steered to a spiral stably into zero forever as a sink attractor.


Assuntos
Neoplasias , Dinâmica não Linear , Humanos
16.
PeerJ Comput Sci ; 9: e1710, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077536

RESUMO

Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.

17.
Sensors (Basel) ; 23(24)2023 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-38139576

RESUMO

This paper introduces the application of a genetic programming (GP)-based method for the automated design and tuning of process controllers, representing a noteworthy advancement in artificial intelligence (AI) within the realm of control engineering. In contrast to already existing work, our GP-based approach operates exclusively in the time domain, incorporating differential operations such as derivatives and integrals without necessitating intermediate inverse Laplace transformations. This unique feature not only simplifies the design process but also ensures the practical implementability of the generated controllers within physical systems. Notably, the GP's functional set extends beyond basic arithmetic operators to include a rich repertoire of mathematical operations, encompassing trigonometric, exponential, and logarithmic functions. This broad set of operations enhances the flexibility and adaptability of the GP-based approach in controller design. To rigorously assess the efficacy of our GP-based approach, we conducted an extensive series of tests to determine its limits and capabilities. In summary, our research establishes the GP-based approach as a promising solution for automating the controller design process, offering a transformative tool to address a spectrum of control problems across various engineering applications.

18.
Comput Toxicol ; 252023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37829618

RESUMO

Adverse outcome pathways provide a powerful tool for understanding the biological signaling cascades that lead to disease outcomes following toxicity. The framework outlines downstream responses known as key events, culminating in a clinically significant adverse outcome as a final result of the toxic exposure. Here we use the AOP framework combined with artificial intelligence methods to gain novel insights into genetic mechanisms that underlie toxicity-mediated adverse health outcomes. Specifically, we focus on liver cancer as a case study with diverse underlying mechanisms that are clinically significant. Our approach uses two complementary AI techniques: Generative modeling via automated machine learning and genetic algorithms, and graph machine learning. We used data from the US Environmental Protection Agency's Adverse Outcome Pathway Database (AOP-DB; aopdb.epa.gov) and the UK Biobank's genetic data repository. We use the AOP-DB to extract disease-specific AOPs and build graph neural networks used in our final analyses. We use the UK Biobank to retrieve real-world genotype and phenotype data, where genotypes are based on single nucleotide polymorphism data extracted from the AOP-DB, and phenotypes are case/control cohorts for the disease of interest (liver cancer) corresponding to those adverse outcome pathways. We also use propensity score matching to appropriately sample based on important covariates (demographics, comorbidities, and social deprivation indices) and to balance the case and control populations in our machine language training/testing datasets. Finally, we describe a novel putative risk factor for LC that depends on genetic variation in both the aryl-hydrocarbon receptor (AHR) and ATP binding cassette subfamily B member 11 (ABCB11) genes.

19.
Res Sq ; 2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37693481

RESUMO

Background: The development of peptides for therapeutic targets or biomarkers for disease diagnosis is a challenging task in protein engineering. Current approaches are tedious, often time-consuming and require complex laboratory data due to the vast search space. In silico methods can accelerate research and substantially reduce costs. Evolutionary algorithms are a promising approach for exploring large search spaces and facilitating the discovery of new peptides. Results: This study presents the development and use of a variant of the initial POET algorithm, called POETRegex, which is based on genetic programming, where individuals are represented by a list of regular expressions. The program was trained on a small curated dataset and employed to predict new peptides that can improve the problem of sensitivity in detecting peptides through magnetic resonance imaging using chemical exchange saturation transfer (CEST). The resulting model achieves a performance gain of 20% over the initial POET variant and is able to predict a candidate peptide with a 58% performance increase compared to the gold-standard peptide. Conclusions: By combining the power of genetic programming with the flexibility of regular expressions, new potential peptide targets were identified to improve the sensitivity of detection by CEST. This approach provides a promising research direction for the efficient identification of peptides with therapeutic or diagnostic potential.

20.
J Hazard Mater ; 460: 132430, 2023 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-37659239

RESUMO

Soil electrokinetic remediation is an emerging and efficient in-situ remediation technology for reducing environmental risks. Promoting the dissolution and migration of Cr in soil under the electric field is crucial to decrease soil toxicity and ecological influences. However, it is difficult to establish strong relationships between soil treatment and impact factors and to quantify their contributions. Machine learning can help establish pollutant migration models, but it is challenging to derive predictive formulas to improve remediation efficiency, describe the predictive model construction process, and reflect the importance of the predictors for better regulation. Therefore, this paper established a predictive model for the electrokinetic remediation of Cr-contaminated soil using genetic programming (GP) after determining the characteristic parameters which influenced the remediation effect, described the model's adaptive optimization process through the algorithm's function, and identified the sensitivity factors affecting the Cr removal effect. Results showed that the Cr(VI) and total Cr concentrations predicted by GP were in satisfactory agreement with the experimental values, 92% of the training data and 90% of the validation data achieved errors within 1%, and could fully reflect the target ions' content variation in different soil layers. By substituting the above prediction formulas into Sobol sensitivity analysis, it was determined that conductivity, pH, current, and moisture content dramatically affected the Cr content variation in distinguished regions. For overall contaminated area, the system current and soil pH were the most sensitive factors for Cr(VI) and total Cr contents. Remediation efforts throughout the contaminated area should focus on the role of current versus soil pH. GP and sensitivity analysis can provide decision support and operational guidance for in-situ soil electrokinetic treatment by establishing a remediation effect prediction model, expediting the development and innovation of electrokinetic technology.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...