Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 92
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37555812

RESUMEN

MOTIVATION: The investigation of DNA methylation can shed light on the processes underlying human well-being and help determine overall human health. However, insufficient coverage makes it challenging to implement single-stranded DNA methylation sequencing technologies, highlighting the need for an efficient prediction model. Models are required to create an understanding of the underlying biological systems and to project single-cell (methylated) data accurately. RESULTS: In this study, we developed positional features for predicting CpG sites. Positional characteristics of the sequence are derived using data from CpG regions and the separation between nearby CpG sites. Multiple optimized classifiers and different ensemble learning approaches are evaluated. The OPTUNA framework is used to optimize the algorithms. The CatBoost algorithm followed by the stacking algorithm outperformed existing DNA methylation identifiers. AVAILABILITY AND IMPLEMENTATION: The data and methodologies used in this study are openly accessible to the research community. Researchers can access the positional features and algorithms used for predicting CpG site methylation patterns. To achieve superior performance, we employed the CatBoost algorithm followed by the stacking algorithm, which outperformed existing DNA methylation identifiers. The proposed iCpG-Pos approach utilizes only positional features, resulting in a substantial reduction in computational complexity compared to other known approaches for detecting CpG site methylation patterns. In conclusion, our study introduces a novel approach, iCpG-Pos, for predicting CpG site methylation patterns. By focusing on positional features, our model offers both accuracy and efficiency, making it a promising tool for advancing DNA methylation research and its applications in human health and well-being.


Asunto(s)
Biología Computacional , Biología Computacional/métodos , Análisis de la Célula Individual , Secuenciación Completa del Genoma , Metilación de ADN
2.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37929975

RESUMEN

MOTIVATION: The origins of replication sites (ORIs) are precise regions inside the DNA sequence where the replication process begins. These locations are critical for preserving the genome's integrity during cell division and guaranteeing the faithful transfer of genetic data from generation to generation. The advent of experimental techniques has aided in the discovery of ORIs in many species. Experimentation, on the other hand, is often more time-consuming and pricey than computational approaches, and it necessitates specific equipment and knowledge. Recently, ORI sites have been predicted using computational techniques like motif-based searches and artificial intelligence algorithms based on sequence characteristics and chromatin states. RESULTS: In this article, we developed ORI-Explorer, a unique artificial intelligence-based technique that combines multiple feature engineering techniques to train CatBoost Classifier for recognizing ORIs from four distinct eukaryotic species. ORI-Explorer was created by utilizing a unique combination of three traditional feature-encoding techniques and a feature set obtained from a deep-learning neural network model. The ORI-Explorer has significantly outperformed current predictors on the testing dataset. Furthermore, by employing the sophisticated SHapley Additive exPlanation method, we give crucial insights that aid in comprehending model success, highlighting the most relevant features vital for forecasting cell-specific ORIs. ORI-Explorer is also intended to aid community-wide attempts in discovering potential ORIs and developing innovative verifiable biological hypotheses. AVAILABILITY AND IMPLEMENTATION: The used datasets along with the source code are made available through https://github.com/Z-Abbas/ORI-Explorer and https://zenodo.org/record/8358679.


Asunto(s)
Inteligencia Artificial , Origen de Réplica , Replicación del ADN , Cromatina , Secuencia de Bases
3.
J Chem Inf Model ; 64(13): 4941-4957, 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38874445

RESUMEN

Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.


Asunto(s)
Antineoplásicos , Neoplasias , Péptidos , Neoplasias/tratamiento farmacológico , Péptidos/química , Humanos , Antineoplásicos/química , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Aprendizaje Profundo , Aprendizaje Automático , Redes Neurales de la Computación , Inteligencia Artificial , Máquina de Vectores de Soporte
4.
Methods ; 217: 49-56, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37454743

RESUMEN

The cytokine interleukin-4 (IL-4) plays an important role in our immune system. IL-4 leads the way in the differentiation of naïve T-helper 0 cells (Th0) to T-helper 2 cells (Th2). The Th2 responses are characterized by the release of IL-4. CD4+ T cells produce the cytokine IL-4 in response to exogenous parasites. IL-4 has a critical role in the growth of CD8+ cells, inflammation, and responses of T-cells. We propose an ensemble model for the prediction of IL-4 inducing peptides. Four feature encodings were extracted to build an efficient predictor: pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, quasi-sequence-order, and Shannon entropy. We developed an ensemble learning model fusion of random forest, extreme gradient boost, light gradient boosting machine, and extra tree classifier in the first layer, and a Gaussian process classifier as a meta classifier in the second layer. The outcome of the benchmarking testing dataset, with a Matthews correlation coefficient of 0.793, showed that the meta-model (Meta-IL4) outperformed individual classifiers. The highest accuracy achieved by the Meta-IL4 model is 90.70%. These findings suggest that peptides that induce IL-4 can be predicted with reasonable accuracy. These models could aid in the development of peptides that trigger the appropriate Th2 response.


Asunto(s)
Interleucina-4 , Péptidos , Citocinas , Aminoácidos , Aprendizaje Automático
5.
Methods ; 218: 14-24, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37385419

RESUMEN

Healthy sleep is vital to all functions in the body. It improves physical and mental health, strengthens resistance against diseases, and develops strong immunity against metabolism and chronic diseases. However, a sleep disorder can cause the inability to sleep well. Sleep apnea syndrome is a critical breathing disorder that occurs during sleeping when breathing stops suddenly and starts when awake, causing sleep disturbance. If it is not treated timely, it can produce loud snoring and drowsiness or causes more acute health problems such as high blood pressure or heart attack. The accepted standard for diagnosing sleep apnea syndrome is full-night polysomnography. However, its limitations include a high cost and inconvenience. This article aims to develop an intelligent monitoring framework for detecting breathing events based on Software Defined Radio Frequency (SDRF) sensing and verify its feasibility for diagnosing sleep apnea syndrome. We extract the wireless channel state information (WCSI) for breathing motion using channel frequency response (CFR) recorded in time at every instant at the receiver. The proposed approach simplifies the receiver structure with the added functionality of communication and sensing together. Initially, simulations are conducted to test the feasibility of the SDRF sensing design for the simulated wireless channel. Then, a real-time experimental setup is developed in a lab environment to address the challenges of the wireless channel. We conducted 100 experiments to collect the dataset of 25 subjects for four breathing patterns. SDRF sensing system accurately detected breathing events during sleep without subject contact. The developed intelligent framework uses machine learning classifiers to classify sleep apnea syndrome and other breathing patterns with an acceptable accuracy of 95.9%. The developed framework aims to build a non-invasive sensing system to diagnose patients conveniently suffering from sleep apnea syndrome. Furthermore, this framework can easily be further extended for E-health applications.


Asunto(s)
Síndromes de la Apnea del Sueño , Humanos , Síndromes de la Apnea del Sueño/diagnóstico , Polisomnografía , Programas Informáticos
6.
Mol Ther ; 31(8): 2543-2551, 2023 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-37271991

RESUMEN

5-methylcytosine (m5C) is indeed a critical post-transcriptional alteration that is widely present in various kinds of RNAs and is crucial to the fundamental biological processes. By correctly identifying the m5C-methylation sites on RNA, clinicians can more clearly comprehend the precise function of these m5C-sites in different biological processes. Due to their effectiveness and affordability, computational methods have received greater attention over the last few years for the identification of methylation sites in various species. To precisely identify RNA m5C locations in five different species including Homo sapiens, Arabidopsis thaliana, Mus musculus, Drosophila melanogaster, and Danio rerio, we proposed a more effective and accurate model named m5C-pred. To create m5C-pred, five distinct feature encoding techniques were combined to extract features from the RNA sequence, and then we used SHapley Additive exPlanations to choose the best features among them, followed by XGBoost as a classifier. We applied the novel optimization method called Optuna to quickly and efficiently determine the best hyperparameters. Finally, the proposed model was evaluated using independent test datasets, and we compared the results with the previous methods. Our approach, m5C- pred, is anticipated to be useful for accurately identifying m5C sites, outperforming the currently available state-of-the-art techniques.


Asunto(s)
Drosophila melanogaster , ARN , Animales , Ratones , ARN/genética , Drosophila melanogaster/genética , Secuencia de Bases
7.
Arch Toxicol ; 98(8): 2647-2658, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38619593

RESUMEN

Cytochrome P450 enzymes are a superfamily of enzymes responsible for the metabolism of a variety of medicines and xenobiotics. Among the Cytochrome P450 family, five isozymes that include 1A2, 2C9, 2C19, 2D6, and 3A4 are most important for the metabolism of xenobiotics. Inhibition of any of these five CYP isozymes causes drug-drug interactions with high pharmacological and toxicological effects. So, the inhibition or non-inhibition prediction of these isozymes is of great importance. Many techniques based on machine learning and deep learning algorithms are currently being used to predict whether these isozymes will be inhibited or not. In this study, three different molecular or substructural properties that include Morgan, MACCS and Morgan (combined) and RDKit of the various molecules are used to train a distinct SVM model against each isozyme (1A2, 2C9, 2C19, 2D6, and 3A4). On the independent dataset, Morgan fingerprints provided the best results, while MACCS and Morgan (combined) achieved comparable results in terms of balanced accuracy (BA), sensitivity (Sn), and Mathews correlation coefficient (MCC). For the Morgan fingerprints, balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4 on an independent dataset ranged between 0.81 and 0.85, 0.61 and 0.70, 0.72 and 0.83, respectively. Similarly, on the independent dataset, MACCS and Morgan (combined) fingerprints achieved competitive results in terms of balanced accuracies (BA), Mathews correlation coefficients (MCC), and sensitivities (Sn) against each CYPs isozyme, 1A2, 2C9, 2C19, 2D6, and 3A4, which ranged between 0.79 and 0.85, 0.59 and 0.69, 0.69 and 0.82, respectively.


Asunto(s)
Inhibidores Enzimáticos del Citocromo P-450 , Sistema Enzimático del Citocromo P-450 , Aprendizaje Automático , Inhibidores Enzimáticos del Citocromo P-450/farmacología , Sistema Enzimático del Citocromo P-450/metabolismo , Humanos , Isoenzimas/metabolismo , Interacciones Farmacológicas , Xenobióticos/toxicidad , Xenobióticos/metabolismo , Máquina de Vectores de Soporte
8.
Int J Mol Sci ; 25(11)2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38892144

RESUMEN

In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.


Asunto(s)
Simulación de Dinámica Molecular , Mapeo de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Biología Computacional/métodos , Algoritmos , Unión Proteica , Modelos Logísticos
9.
Int J Mol Sci ; 25(2)2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38255790

RESUMEN

Computational methods play a pivotal role in the pursuit of efficient drug discovery, enabling the rapid assessment of compound properties before costly and time-consuming laboratory experiments. With the advent of technology and large data availability, machine and deep learning methods have proven efficient in predicting molecular solubility. High-precision in silico solubility prediction has revolutionized drug development by enhancing formulation design, guiding lead optimization, and predicting pharmacokinetic parameters. These benefits result in considerable cost and time savings, resulting in a more efficient and shortened drug development process. The proposed SolPredictor is designed with the aim of developing a computational model for solubility prediction. The model is based on residual graph neural network convolution (RGNN). The RGNNs were designed to capture long-range dependencies in graph-structured data. Residual connections enable information to be utilized over various layers, allowing the model to capture and preserve essential features and patterns scattered throughout the network. The two largest datasets available to date are compiled, and the model uses a simplified molecular-input line-entry system (SMILES) representation. SolPredictor uses the ten-fold split cross-validation Pearson correlation coefficient R2 0.79±0.02 and root mean square error (RMSE) 1.03±0.04. The proposed model was evaluated using five independent datasets. Error analysis, hyperparameter optimization analysis, and model explainability were used to determine the molecular features that were most valuable for prediction.


Asunto(s)
Desarrollo de Medicamentos , Descubrimiento de Drogas , Solubilidad , Correlación de Datos , Redes Neurales de la Computación
10.
Int J Mol Sci ; 25(7)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38612558

RESUMEN

Cruzipain inhibitors are required after medications to treat Chagas disease because of the need for safer, more effective treatments. Trypanosoma cruzi is the source of cruzipain, a crucial cysteine protease that has driven interest in using computational methods to create more effective inhibitors. We employed a 3D-QSAR model, using a dataset of 36 known inhibitors, and a pharmacophore model to identify potential inhibitors for cruzipain. We also built a deep learning model using the Deep purpose library, trained on 204 active compounds, and validated it with a specific test set. During a comprehensive screening of the Drug Bank database of 8533 molecules, pharmacophore and deep learning models identified 1012 and 340 drug-like molecules, respectively. These molecules were further evaluated through molecular docking, followed by induced-fit docking. Ultimately, molecular dynamics simulation was performed for the final potent inhibitors that exhibited strong binding interactions. These results present four novel cruzipain inhibitors that can inhibit the cruzipain protein of T. cruzi.


Asunto(s)
Enfermedad de Chagas , Cisteína Endopeptidasas , Humanos , Simulación del Acoplamiento Molecular , Proteínas Protozoarias , Enfermedad de Chagas/tratamiento farmacológico , Diseño de Fármacos
11.
Bioinformatics ; 38(16): 3885-3891, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35771648

RESUMEN

MOTIVATION: DNA N6-methyladenine (6mA) has been demonstrated to have an essential function in epigenetic modification in eukaryotic species in recent research. 6mA has been linked to various biological processes. It's critical to create a new algorithm that can rapidly and reliably detect 6mA sites in genomes to investigate their biological roles. The identification of 6mA marks in the genome is the first and most important step in understanding the underlying molecular processes, as well as their regulatory functions. RESULTS: In this article, we proposed a novel computational tool called i6mA-Caps which CapsuleNet based a framework for identifying the DNA N6-methyladenine sites. The proposed framework uses a single encoding scheme for numerical representation of the DNA sequence. The numerical data is then used by the set of convolution layers to extract low-level features. These features are then used by the capsule network to extract intermediate-level and later high-level features to classify the 6mA sites. The proposed network is evaluated on three datasets belonging to three genomes which are Rosaceae, Rice and Arabidopsis thaliana. Proposed method has attained an accuracy of 96.71%, 94% and 86.83% for independent Rosaceae dataset, Rice dataset and A.thaliana dataset respectively. The proposed framework has exhibited improved results when compared with the existing top-of-the-line methods. AVAILABILITY AND IMPLEMENTATION: A user-friendly web-server is made available for the biological experts which can be accessed at: http://nsclbio.jbnu.ac.kr/tools/i6mA-Caps/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ADN , Oryza , ADN/genética , Epigénesis Genética , Genoma , Metilación de ADN , Oryza/genética
12.
J Chem Inf Model ; 63(9): 2628-2643, 2023 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-37125780

RESUMEN

Toxicity prediction is a critical step in the drug discovery process that helps identify and prioritize compounds with the greatest potential for safe and effective use in humans, while also reducing the risk of costly late-stage failures. It is estimated that over 30% of drug candidates are discarded owing to toxicity. Recently, artificial intelligence (AI) has been used to improve drug toxicity prediction as it provides more accurate and efficient methods for identifying the potentially toxic effects of new compounds before they are tested in human clinical trials, thus saving time and money. In this review, we present an overview of recent advances in AI-based drug toxicity prediction, including the use of various machine learning algorithms and deep learning architectures, of six major toxicity properties and Tox21 assay end points. Additionally, we provide a list of public data sources and useful toxicity prediction tools for the research community and highlight the challenges that must be addressed to enhance model performance. Finally, we discuss future perspectives for AI-based drug toxicity prediction. This review can aid researchers in understanding toxicity prediction and pave the way for new methods of drug discovery.


Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Aprendizaje Automático , Bioensayo , Descubrimiento de Drogas
13.
J Chem Inf Model ; 63(20): 6198-6211, 2023 10 23.
Artículo en Inglés | MEDLINE | ID: mdl-37819031

RESUMEN

Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Fenómenos Químicos , Bases de Datos Factuales
14.
Int J Mol Sci ; 24(3)2023 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-36768139

RESUMEN

Drug distribution is an important process in pharmacokinetics because it has the potential to influence both the amount of medicine reaching the active sites and the effectiveness as well as safety of the drug. The main causes of 90% of drug failures in clinical development are lack of efficacy and uncontrolled toxicity. In recent years, several advances and promising developments in drug distribution property prediction have been achieved, especially in silico, which helped to drastically reduce the time and expense of screening undesired drug candidates. In this study, we provide comprehensive knowledge of drug distribution background, influencing factors, and artificial intelligence-based distribution property prediction models from 2019 to the present. Additionally, we gathered and analyzed public databases and datasets commonly utilized by the scientific community for distribution prediction. The distribution property prediction performance of five large ADMET prediction tools is mentioned as a benchmark for future research. On this basis, we also offer future challenges in drug distribution prediction and research directions. We hope that this review will provide researchers with helpful insight into distribution prediction, thus facilitating the development of innovative approaches for drug discovery.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Diseño de Fármacos
15.
Genomics ; 113(1 Pt 2): 582-592, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33010390

RESUMEN

DNA N6-methyladenine (6 mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6 mA sites in the Rosaceae genome may assist in understanding genomic 6 mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6 mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6 mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named 'i6mA-stack'. This tool outperforms its peer tool in general and is currently available at http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/.


Asunto(s)
Adenina/análogos & derivados , Metilación de ADN , Rosaceae/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Adenina/metabolismo
16.
Genomics ; 113(5): 3030-3038, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34242708

RESUMEN

With the rapidly growing importance of biological research, non-coding RNAs (ncRNA) attract more attention in biology and bioinformatics. They play vital roles in biological processes such as transcription and translation. Classification of ncRNAs is essential to our understanding of disease mechanisms and treatment design. Many approaches to ncRNA classification have been developed, several of which use machine learning and deep learning. In this paper, we construct a novel deep learning-based architecture, ncRDense, to effectively classify and distinguish ncRNA families. In a comparative study, our model produces comparable results with existing state-of-the-art methods. Finally, we built a freely accessible web server for the ncRDense tool, which is available at http://nsclbio.jbnu.ac.kr/tools/ncRDense/.


Asunto(s)
Aprendizaje Profundo , Biología Computacional/métodos , Humanos , Aprendizaje Automático , ARN no Traducido/genética
17.
Int J Mol Sci ; 23(9)2022 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-35563648

RESUMEN

Identification of ionic liquids with low toxicity is paramount for applications in various domains. Traditional approaches used for determining the toxicity of ionic liquids are often expensive, and can be labor intensive and time consuming. In order to mitigate these limitations, researchers have resorted to using computational models. This work presents a probabilistic model built from deep kernel learning with the aim of predicting the toxicity of ionic liquids in the leukemia rat cell line (IPC-81). Only open source tools, namely, RDKit and Mol2vec, are required to generate predictors for this model; as such, its predictions are solely based on chemical structure of the ionic liquids and no manual extraction of features is needed. The model recorded an RMSE of 0.228 and R2 of 0.943. These results indicate that the model is both reliable and accurate. Furthermore, this model provides an accompanying uncertainty level for every prediction it makes. This is important because discrepancies in experimental measurements that generated the dataset used herein are inevitable, and ought to be modeled. A user-friendly web server was developed as well, enabling researchers and practitioners ti make predictions using this model.


Asunto(s)
Líquidos Iónicos , Animales , Línea Celular , Líquidos Iónicos/química , Líquidos Iónicos/toxicidad , Modelos Estadísticos , Relación Estructura-Actividad Cuantitativa , Ratas
18.
Int J Mol Sci ; 23(15)2022 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-35955447

RESUMEN

N6-methyladenine (6mA) has been recognized as a key epigenetic alteration that affects a variety of biological activities. Precise prediction of 6mA modification sites is essential for understanding the logical consistency of biological activity. There are various experimental methods for identifying 6mA modification sites, but in silico prediction has emerged as a potential option due to the very high cost and labor-intensive nature of experimental procedures. Taking this into consideration, developing an efficient and accurate model for identifying N6-methyladenine is one of the top objectives in the field of bioinformatics. Therefore, we have created an in silico model for the classification of 6mA modifications in plant genomes. ENet-6mA uses three encoding methods, including one-hot, nucleotide chemical properties (NCP), and electron-ion interaction potential (EIIP), which are concatenated and fed as input to ElasticNet for feature reduction, and then the optimized features are given directly to the neural network to get classified. We used a benchmark dataset of rice for five-fold cross-validation testing and three other datasets from plant genomes for cross-species testing purposes. The results show that the model can predict the N6-methyladenine sites very well, even cross-species. Additionally, we separated the datasets into different ratios and calculated the performance using the area under the precision-recall curve (AUPRC), achieving 0.81, 0.79, and 0.50 with 1:10 (positive:negative) samples for F. vesca, R. chinensis, and A. thaliana, respectively.


Asunto(s)
Metilación de ADN , Oryza , Biología Computacional , Genoma de Planta , Redes Neurales de la Computación , Oryza/genética
19.
Int J Mol Sci ; 23(15)2022 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-35955587

RESUMEN

Drug discovery, which aids to identify potential novel treatments, entails a broad range of fields of science, including chemistry, pharmacology, and biology. In the early stages of drug development, predicting drug-target affinity is crucial. The proposed model, the prediction of drug-target affinity using a convolution model with self-attention (CSatDTA), applies convolution-based self-attention mechanisms to the molecular drug and target sequences to predict drug-target affinity (DTA) effectively, unlike previous convolution methods, which exhibit significant limitations related to this aspect. The convolutional neural network (CNN) only works on a particular region of information, excluding comprehensive details. Self-attention, on the other hand, is a relatively recent technique for capturing long-range interactions that has been used primarily in sequence modeling tasks. The results of comparative experiments show that CSatDTA surpasses previous sequence-based or other approaches and has outstanding retention abilities.


Asunto(s)
Descubrimiento de Drogas , Redes Neurales de la Computación , Desarrollo de Medicamentos , Descubrimiento de Drogas/métodos
20.
Int J Mol Sci ; 23(24)2022 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-36555297

RESUMEN

Organ toxicity caused by chemicals is a serious problem in the creation and usage of chemicals such as medications, insecticides, chemical products, and cosmetics. In recent decades, the initiation and development of chemical-induced organ damage have been related to mitochondrial dysfunction, among several adverse effects. Recently, many drugs, for example, troglitazone, have been removed from the marketplace because of significant mitochondrial toxicity. As a result, it is an urgent requirement to develop in silico models that can reliably anticipate chemical-induced mitochondrial toxicity. In this paper, we have proposed an explainable machine-learning model to classify mitochondrially toxic and non-toxic compounds. After several experiments, the Mordred feature descriptor was shortlisted to be used after feature selection. The selected features used with the CatBoost learning algorithm achieved a prediction accuracy of 85% in 10-fold cross-validation and 87.1% in independent testing. The proposed model has illustrated improved prediction accuracy when compared with the existing state-of-the-art method available in the literature. The proposed tree-based ensemble model, along with the global model explanation, will aid pharmaceutical chemists in better understanding the prediction of mitochondrial toxicity.


Asunto(s)
Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Cognición , Aprendizaje Automático , Mitocondrias
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA