Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 566(7743): 254-258, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30728500

RESUMEN

Osteoarthritis-the most common form of age-related degenerative whole-joint disease1-is primarily characterized by cartilage destruction, as well as by synovial inflammation, osteophyte formation and subchondral bone remodelling2,3. However, the molecular mechanisms that underlie the pathogenesis of osteoarthritis are largely unknown. Although osteoarthritis is currently considered to be associated with metabolic disorders, direct evidence for this is lacking, and the role of cholesterol metabolism in the pathogenesis of osteoarthritis has not been fully investigated4-6. Various types of cholesterol hydroxylases contribute to cholesterol metabolism in extrahepatic tissues by converting cellular cholesterol to circulating oxysterols, which regulate diverse biological processes7,8. Here we show that the CH25H-CYP7B1-RORα axis of cholesterol metabolism in chondrocytes is a crucial catabolic regulator of the pathogenesis of osteoarthritis. Osteoarthritic chondrocytes had increased levels of cholesterol because of enhanced uptake, upregulation of cholesterol hydroxylases (CH25H and CYP7B1) and increased production of oxysterol metabolites. Adenoviral overexpression of CH25H or CYP7B1 in mouse joint tissues caused experimental osteoarthritis, whereas knockout or knockdown of these hydroxylases abrogated the pathogenesis of osteoarthritis. Moreover, retinoic acid-related orphan receptor alpha (RORα) was found to mediate the induction of osteoarthritis by alterations in cholesterol metabolism. These results indicate that osteoarthritis is a disease associated with metabolic disorders and suggest that targeting the CH25H-CYP7B1-RORα axis of cholesterol metabolism may provide a therapeutic avenue for treating osteoarthritis.


Asunto(s)
Colesterol/metabolismo , Familia 7 del Citocromo P450/metabolismo , Miembro 1 del Grupo F de la Subfamilia 1 de Receptores Nucleares/metabolismo , Osteoartritis/metabolismo , Esteroide Hidroxilasas/metabolismo , Animales , Transporte Biológico , Condrocitos/enzimología , Condrocitos/metabolismo , Masculino , Ratones , Miembro 1 del Grupo F de la Subfamilia 1 de Receptores Nucleares/genética , Osteoartritis/enzimología , Osteoartritis/patología , Oxiesteroles/metabolismo , Esteroide Hidroxilasas/deficiencia , Regulación hacia Arriba
2.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35709752

RESUMEN

Unintended inhibition of the human ether-à-go-go-related gene (hERG) ion channel by small molecules leads to severe cardiotoxicity. Thus, hERG channel blockage is a significant concern in the development of new drugs. Several computational models have been developed to predict hERG channel blockage, including deep learning models; however, they lack robustness, reliability and interpretability. Here, we developed a graph-based Bayesian deep learning model for hERG channel blocker prediction, named BayeshERG, which has robust predictive power, high reliability and high resolution of interpretability. First, we applied transfer learning with 300 000 large data in initial pre-training to increase the predictive performance. Second, we implemented a Bayesian neural network with Monte Carlo dropout to calibrate the uncertainty of the prediction. Third, we utilized global multihead attentive pooling to augment the high resolution of structural interpretability for the hERG channel blockers and nonblockers. We conducted both internal and external validations for stringent evaluation; in particular, we benchmarked most of the publicly available hERG channel blocker prediction models. We showed that our proposed model outperformed predictive performance and uncertainty calibration performance. Furthermore, we found that our model learned to focus on the essential substructures of hERG channel blockers via an attention mechanism. Finally, we validated the prediction results of our model by conducting in vitro experiments and confirmed its high validity. In summary, BayeshERG could serve as a versatile tool for discovering hERG channel blockers and helping maximize the possibility of successful drug discovery. The data and source code are available at our GitHub repository (https://github.com/GIST-CSBL/BayeshERG).


Asunto(s)
Aprendizaje Profundo , Canales de Potasio Éter-A-Go-Go , Teorema de Bayes , Canales de Potasio Éter-A-Go-Go/química , Canales de Potasio Éter-A-Go-Go/genética , Humanos , Bloqueadores de los Canales de Potasio/química , Bloqueadores de los Canales de Potasio/farmacología , Reproducibilidad de los Resultados
3.
BMC Genomics ; 24(1): 613, 2023 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-37828501

RESUMEN

BACKGROUND: The domestic dog, Canis lupus familiaris, is a companion animal for humans as well as an animal model in cancer research due to similar spontaneous occurrence of cancers as humans. Despite the social and biological importance of dogs, the catalogue of genomic variations and transcripts for dogs is relatively incomplete. RESULTS: We developed CanISO, a new database to hold a large collection of transcriptome profiles and genomic variations for domestic dogs. CanISO provides 87,692 novel transcript isoforms and 60,992 known isoforms from whole transcriptome sequencing of canine tumors (N = 157) and their matched normal tissues (N = 64). CanISO also provides genomic variation information for 210,444 unique germline single nucleotide polymorphisms (SNPs) from the whole exome sequencing of 183 dogs, with a query system that searches gene- and transcript-level information as well as covered SNPs. Transcriptome profiles can be compared with corresponding human transcript isoforms at a tissue level, or between sample groups to identify tumor-specific gene expression and alternative splicing patterns. CONCLUSIONS: CanISO is expected to increase understanding of the dog genome and transcriptome, as well as its functional associations with humans, such as shared/distinct mechanisms of cancer. CanISO is publicly available at https://www.kobic.re.kr/caniso/ .


Asunto(s)
Neoplasias , Lobos , Perros , Animales , Humanos , Transcriptoma , Lobos/genética , Genoma , Genómica , Neoplasias/genética , Neoplasias/veterinaria , Isoformas de Proteínas/genética
4.
J Chem Inf Model ; 61(8): 3858-3867, 2021 08 23.
Artículo en Inglés | MEDLINE | ID: mdl-34342985

RESUMEN

Understanding differences in drug responses between patients is crucial for delivering effective cancer treatment. We describe an interpretable AI model for use in predicting drug responses in cancer cells at the gene, molecular pathway, and drug level, which we have called the hierarchical network for drug response prediction with attention. We found that the model shows better accuracy in predicting drugs having efficacy against a given cell line than other state-of-the-art methods, with a root mean squared error of 1.0064, a Pearson's correlation coefficient of 0.9307, and an R2 value of 0.8647. We also confirmed that the model gives high attention to drug-target genes and cancer-related pathways when predicting a response. The validity of predicted results was proven by in vitro cytotoxicity assay. Overall, we propose that our hierarchical and interpretable AI-based model is capable of interpreting intrinsic characteristics of cancer cells and drugs for accurate prediction of cancer-drug responses.


Asunto(s)
Antineoplásicos , Neoplasias , Preparaciones Farmacéuticas , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Humanos , Neoplasias/tratamiento farmacológico
5.
BMC Bioinformatics ; 21(1): 175, 2020 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-32366211

RESUMEN

BACKGROUND: Genome-wide studies of DNA methylation across the epigenetic landscape provide insights into the heterogeneity of pluripotent embryonic stem cells (ESCs). Differentiating into embryonic somatic and germ cells, ESCs exhibit varying degrees of pluripotency, and epigenetic changes occurring in this process have emerged as important factors explaining stem cell pluripotency. RESULTS: Here, using paired scBS-seq and scRNA-seq data of mice, we constructed a machine learning model that predicts degrees of pluripotency for mouse ESCs. Since the biological activities of non-CpG markers have yet to be clarified, we tested the predictive power of CpG and non-CpG markers, as well as a combination thereof, in the model. Through rigorous performance evaluation with both internal and external validation, we discovered that a model using both CpG and non-CpG markers predicted the pluripotency of ESCs with the highest prediction performance (0.956 AUC, external test). The prediction model consisted of 16 CpG and 33 non-CpG markers. The CpG and most of the non-CpG markers targeted depletions of methylation and were indicative of cell pluripotency, whereas only a few non-CpG markers reflected accumulations of methylation. Additionally, we confirmed that there exists the differing pluripotency between individual developmental stages, such as E3.5 and E6.5, as well as between induced mouse pluripotent stem cell (iPSC) and somatic cell. CONCLUSIONS: In this study, we investigated CpG and non-CpG methylation in relation to mouse stem cell pluripotency and developed a model thereon that successfully predicts the pluripotency of mouse ESCs.


Asunto(s)
Islas de CpG , Metilación de ADN , Células Madre Pluripotentes/metabolismo , Animales , Epigénesis Genética , Epigenómica , Ratones , Células Madre Embrionarias de Ratones/metabolismo
6.
PLoS Comput Biol ; 15(6): e1007129, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31199797

RESUMEN

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors have been shown to not be sufficiently informative to predict accurate DTIs. Thus, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs. When we employ a convolutional neural network (CNN) on raw protein sequences, we perform convolution on various lengths of amino acids subsequences to capture local residue patterns of generalized protein classes. We train our model with large-scale DTI information and demonstrate the performance of the proposed model using an independent dataset that is not seen during the training phase. As a result, our model performs better than previous protein descriptor-based models. Also, our model performs better than the recently developed deep learning models for massive prediction of DTIs. By examining pooled convolution results, we confirmed that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches. Our code is available at https://github.com/GIST-CSBL/DeepConv-DTI.


Asunto(s)
Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Proteínas , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Sitios de Unión , Biología Computacional , Simulación por Computador , Ligandos , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo
7.
Nucleic Acids Res ; 46(6): 2901-2917, 2018 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-29394395

RESUMEN

Two major transcriptional regulators of carbon metabolism in bacteria are Cra and CRP. CRP is considered to be the main mediator of catabolite repression. Unlike for CRP, in vivo DNA binding information of Cra is scarce. Here we generate and integrate ChIP-exo and RNA-seq data to identify 39 binding sites for Cra and 97 regulon genes that are regulated by Cra in Escherichia coli. An integrated metabolic-regulatory network was formed by including experimentally-derived regulatory information and a genome-scale metabolic network reconstruction. Applying analysis methods of systems biology to this integrated network showed that Cra enables optimal bacterial growth on poor carbon sources by redirecting and repressing glycolysis flux, by activating the glyoxylate shunt pathway, and by activating the respiratory pathway. In these regulatory mechanisms, the overriding regulatory activity of Cra over CRP is fundamental. Thus, elucidation of interacting transcriptional regulation of core carbon metabolism in bacteria by two key transcription factors was possible by combining genome-wide experimental measurement and simulation with a genome-scale metabolic model.


Asunto(s)
Proteínas Bacterianas/genética , Carbono/metabolismo , Proteína Receptora de AMP Cíclico/genética , Proteínas de Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Proteínas Represoras/genética , Biología de Sistemas/métodos , Proteínas Bacterianas/metabolismo , Sitios de Unión/genética , Proteína Receptora de AMP Cíclico/metabolismo , ADN Bacteriano/genética , ADN Bacteriano/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Genoma Bacteriano/genética , Glucólisis/genética , Redes y Vías Metabólicas/genética , Unión Proteica , Regulón/genética , Proteínas Represoras/metabolismo , Factores de Transcripción/metabolismo
8.
Biotechnol Bioprocess Eng ; 25(6): 895-930, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33437151

RESUMEN

As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.

9.
BMC Bioinformatics ; 20(Suppl 10): 247, 2019 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-31138103

RESUMEN

BACKGROUND: Drug repositioning, also known as drug repurposing, defines new indications for existing drugs and can be used as an alternative to drug development. In recent years, the accumulation of large volumes of information related to drugs and diseases has led to the development of various computational approaches for drug repositioning. Although herbal medicines have had a great impact on current drug discovery, there are still a large number of herbal compounds that have no definite indications. RESULTS: In the present study, we constructed a computational model to predict the unknown pharmacological effects of herbal compounds using machine learning techniques. Based on the assumption that similar diseases can be treated with similar drugs, we used four categories of drug-drug similarity (e.g., chemical structure, side-effects, gene ontology, and targets) and three categories of disease-disease similarity (e.g., phenotypes, human phenotype ontology, and gene ontology). Then, associations between drug and disease were predicted using the employed similarity features. The prediction models were constructed using classification algorithms, including logistic regression, random forest and support vector machine algorithms. Upon cross-validation, the random forest approach showed the best performance (AUC = 0.948) and also performed well in an external validation assessment using an unseen independent dataset (AUC = 0.828). Finally, the constructed model was applied to predict potential indications for existing drugs and herbal compounds. As a result, new indications for 20 existing drugs and 31 herbal compounds were predicted and validated using clinical trial data. CONCLUSIONS: The predicted results were validated manually confirming the performance and underlying mechanisms - for example, irinotecan as a treatment for neuroblastoma. From the prediction, herbal compounds were considered to be drug candidates for related diseases which is important to be further developed. The proposed prediction model can contribute to drug discovery by suggesting drug candidates from herbal compounds which have potentials but few were studied.


Asunto(s)
Reposicionamiento de Medicamentos , Aprendizaje Automático , Fitoquímicos/farmacología , Algoritmos , Ontología de Genes , Humanos , Modelos Logísticos , Modelos Biológicos , Preparaciones Farmacéuticas , Fenotipo , Reproducibilidad de los Resultados
10.
BMC Bioinformatics ; 19(Suppl 8): 208, 2018 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-29897326

RESUMEN

BACKGROUND: Identification of drug-target interactions acts as a key role in drug discovery. However, identifying drug-target interactions via in-vitro, in-vivo experiments are very laborious, time-consuming. Thus, predicting drug-target interactions by using computational approaches is a good alternative. In recent studies, many feature-based and similarity-based machine learning approaches have shown promising results in drug-target interaction predictions. A previous study showed that accounting connectivity information of drug-drug and protein-protein interactions increase performances of prediction by the concept of 'guilt-by-association'. However, the approach that only considers directly connected nodes often misses the information that could be derived from distance nodes. Therefore, in this study, we yield global network topology information by using a random walk with restart algorithm and apply the global topology information to the prediction model. RESULTS: As a result, our prediction model demonstrates increased prediction performance compare to the 'guilt-by-association' approach (AUC 0.89 and 0.67 in the training and independent test, respectively). In addition, we show how weighted features by a random walk with restart yields better performances than original features. Also, we confirmed that drugs and proteins that have high-degree of connectivity on the interactome network yield better performance in our model. CONCLUSIONS: The prediction models with weighted features by considering global network topology increased the prediction performances both in the training and testing compared to non-weighted models and previous a 'guilt-by-association method'. In conclusion, global network topology information on protein-protein interaction and drug-drug interaction effects to the prediction performance of drug-target interactions.


Asunto(s)
Algoritmos , Interacciones Farmacológicas , Bases de Datos como Asunto , Humanos , Aprendizaje Automático , Probabilidad
11.
BMC Bioinformatics ; 18(Suppl 7): 227, 2017 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-28617228

RESUMEN

BACKGROUND: Drug-induced liver injury (DILI) is a critical issue in drug development because DILI causes failures in clinical trials and the withdrawal of approved drugs from the market. There have been many attempts to predict the risk of DILI based on in vivo and in silico identification of hepatotoxic compounds. In the current study, we propose the in silico prediction model predicting DILI using weighted molecular fingerprints. RESULTS: In this study, we used 881 bits of molecular fingerprint and used as features describing presence or absence of each substructure of compounds. Then, the Bayesian probability of each substructure was calculated and labeled (positive or negative for DILI), and a weighted fingerprint was determined from the ratio of DILI-positive to DILI-negative probability values. Using weighted fingerprint features, the prediction models were trained and evaluated with the Random Forest (RF) and Support Vector Machine (SVM) algorithms. The constructed models yielded accuracies of 73.8% and 72.6%, AUCs of 0.791 and 0.768 in cross-validation. In independent tests, models achieved accuracies of 60.1% and 61.1% for RF and SVM, respectively. The results validated that weighted features helped increase overall performance of prediction models. The constructed models were further applied to the prediction of natural compounds in herbs to identify DILI potential, and 13,996 unique herbal compounds were predicted as DILI-positive with the SVM model. CONCLUSIONS: The prediction models with weighted features increased the performance compared to non-weighted models. Moreover, we predicted the DILI potential of herbs with the best performed model, and the prediction results suggest that many herbal compounds could have potential to be DILI. We can thus infer that taking natural products without detailed references about the relevant pathways may be dangerous. Considering the frequency of use of compounds in natural herbs and their increased application in drug development, DILI labeling would be very important.


Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Modelos Teóricos , Teorema de Bayes , Productos Biológicos/química , Productos Biológicos/toxicidad , Bases de Datos Factuales , Humanos , Máquina de Vectores de Soporte
12.
BMC Bioinformatics ; 17 Suppl 6: 219, 2016 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-27490208

RESUMEN

BACKGROUND: Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. However, this entails a great deal of effort to clarify the interaction throughout in vitro or in vivo experiments. In this light, in silico prediction of the interactions between compounds and target proteins can help ease the efforts. RESULTS: In this study, we performed in silico predictions of herbal compound target identification. First, data related to compounds, target proteins, and interactions between them are taken from the DrugBank database. Then we characterized six classes of compound-target interaction in humans including G-protein-coupled receptors (GPCRs), ion channel, enzymes, receptors, transporters, and other proteins. Also, classification-prediction models that predict the interactions between compounds and target proteins through a machine learning method were constructed using these matrices. As a result, AUC values of six classes are 0.94, 0.93, 0.90, 0.89, 0.91, and 0.76 respectively. Finally, the interactions of compounds from natural products were predicted using the constructed classification models. Furthermore, from our predicted results, we confirmed that several important disease related proteins were predicted as targets of natural herbal compounds. CONCLUSIONS: We constructed classification-prediction models that predict the interactions between compounds and target proteins. The constructed models showed good prediction performances, and numbers of potential natural compounds target proteins were predicted from our results.


Asunto(s)
Productos Biológicos/análisis , Simulación por Computador , Descubrimiento de Drogas , Plantas Medicinales/química , Modelos Químicos , Unión Proteica , Máquina de Vectores de Soporte
13.
Bioinformatics ; 31(19): 3105-13, 2015 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26071141

RESUMEN

MOTIVATION: Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples. RESULTS: We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10-70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data. AVAILABILITY AND IMPLEMENTATION: Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/ CONTACT: swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Modelos Estadísticos , Análisis de Secuencia de ADN/métodos , Eliminación de Secuencia/genética , Programas Informáticos , Simulación por Computador , Bases de Datos Genéticas , Humanos , Trastornos Mentales/genética , Reproducibilidad de los Resultados
14.
BMC Med Inform Decis Mak ; 16 Suppl 1: 56, 2016 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-27454576

RESUMEN

BACKGROUND: The survival of patients with breast cancer is highly sporadic, from a few months to more than 15 years. In recent studies, the gene expression profiling of tumors has been used as a promising means of predicting prognosis factors. METHODS: In this study, we used gene expression datasets of tumors to identify prognostic factors in breast cancer. We conducted log-rank tests and used unsupervised clustering methods to find reciprocally expressed gene sets associated with worse survival rates. Prognosis prediction scores were determined as the ratio of gene expressions. RESULTS: As a result, four prognosis prediction gene set modules were constructed. The four prognostic gene sets predicted worse survival rates in three independent gene expression data sets. In addition, we found that cancer patient with poor prognosis, i.e., triple-negative cancer, HER2-enriched, TP53 mutated and high-graded patients had higher prognosis prediction scores than those with other types of breast cancer. CONCLUSIONS: In conclusion, based on a gene expression analysis, we suggest that our well-defined scoring method of the prediction of survival outcome may be useful for developing prognostic factors in breast cancer.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias de la Mama/diagnóstico , Perfilación de la Expresión Génica/métodos , Marcadores Genéticos/genética , Análisis de Supervivencia , Neoplasias de la Mama/genética , Neoplasias de la Mama/mortalidad , Femenino , Humanos , Pronóstico
15.
PLoS Comput Biol ; 10(9): e1003837, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25232952

RESUMEN

Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH), succinate dehydrogenase (SDH), and fumarate hydratase (FH) that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes), expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers.


Asunto(s)
Redes y Vías Metabólicas/genética , Metaboloma/genética , Neoplasias/genética , Neoplasias/metabolismo , Biología de Sistemas/métodos , Línea Celular Tumoral , Análisis por Conglomerados , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Mutación/genética
16.
BMC Med Inform Decis Mak ; 15 Suppl 1: S6, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26043747

RESUMEN

BACKGROUND: Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations. METHODS: In order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations. RESULTS: We analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively. CONCLUSIONS: We analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power.


Asunto(s)
Minería de Datos/métodos , Genómica/métodos , Aprendizaje Automático , Mutación/genética , Humanos
17.
J Cheminform ; 15(1): 77, 2023 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-37674239

RESUMEN

In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad spectrum of diseases. Due to the lack of data required for the training of deep generative models, however, some of these approaches have fine-tuned their molecular generators using data obtained from a separate predictor. While these studies show that generative models can produce structures with the desired target properties, it remains unclear whether the diversity of the generated structures and the span of their chemical space align with the distribution of the intended target molecules. In this study, we present a novel generative framework, LOGICS, a framework for Learning Optimal Generative distribution Iteratively for designing target-focused Chemical Structures. We address the exploration-exploitation dilemma, which weighs the choice between exploring new options and exploiting current knowledge. To tackle this issue, we incorporate experience memory and employ a layered tournament selection approach to refine the fine-tuning process. The proposed method was applied to the binding affinity optimization of two target proteins of different protein classes, κ-opioid receptors, and PIK3CA, and the quality and the distribution of the generative molecules were evaluated. The results showed that LOGICS outperforms competing state-of-the-art models and generates more diverse de novo chemical structures with optimized properties. The source code is available at the GitHub repository ( https://github.com/GIST-CSBL/LOGICS ).

18.
Protein Sci ; 32(1): e4529, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36461699

RESUMEN

Antimicrobial resistance is a growing health concern. Antimicrobial peptides (AMPs) disrupt harmful microorganisms by nonspecific mechanisms, making it difficult for microbes to develop resistance. Accordingly, they are promising alternatives to traditional antimicrobial drugs. In this study, we developed an improved AMP classification model, called AMP-BERT. We propose a deep learning model with a fine-tuned didirectional encoder representations from transformers (BERT) architecture designed to extract structural/functional information from input peptides and identify each input as AMP or non-AMP. We compared the performance of our proposed model and other machine/deep learning-based methods. Our model, AMP-BERT, yielded the best prediction results among all models evaluated with our curated external dataset. In addition, we utilized the attention mechanism in BERT to implement an interpretable feature analysis and determine the specific residues in known AMPs that contribute to peptide structure and antimicrobial function. The results show that AMP-BERT can capture the structural properties of peptides for model learning, enabling the prediction of AMPs or non-AMPs from input sequences. AMP-BERT is expected to contribute to the identification of candidate AMPs for functional validation and drug development. The code and dataset for the fine-tuning of AMP-BERT is publicly available at https://github.com/GIST-CSBL/AMP-BERT.


Asunto(s)
Péptidos Antimicrobianos , Aprendizaje Automático
19.
Mol Syst Biol ; 7: 535, 2011 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-21988831

RESUMEN

The initial genome-scale reconstruction of the metabolic network of Escherichia coli K-12 MG1655 was assembled in 2000. It has been updated and periodically released since then based on new and curated genomic and biochemical knowledge. An update has now been built, named iJO1366, which accounts for 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. iJO1366 was (1) updated in part using a new experimental screen of 1075 gene knockout strains, illuminating cases where alternative pathways and isozymes are yet to be discovered, (2) compared with its predecessor and to experimental data sets to confirm that it continues to make accurate phenotypic predictions of growth on different substrates and for gene knockout strains, and (3) mapped to the genomes of all available sequenced E. coli strains, including pathogens, leading to the identification of hundreds of unannotated genes in these organisms. Like its predecessors, the iJO1366 reconstruction is expected to be widely deployed for studying the systems biology of E. coli and for metabolic engineering applications.


Asunto(s)
Biología Computacional/métodos , Escherichia coli K12 , Genes Bacterianos , Genoma Bacteriano , Genómica/métodos , Biología de Sistemas/métodos , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Técnicas de Inactivación de Genes , Ingeniería Metabólica , Redes y Vías Metabólicas , Modelos Biológicos
20.
J Cheminform ; 14(1): 9, 2022 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-35246258

RESUMEN

Adverse drug-drug interaction (DDI) is a major concern to polypharmacy due to its unexpected adverse side effects and must be identified at an early stage of drug discovery and development. Many computational methods have been proposed for this purpose, but most require specific types of information, or they have less concern in interpretation on underlying genes. We propose a deep learning-based framework for DDI prediction with drug-induced gene expression signatures so that the model can provide the expression level of interpretability for DDIs. The model engineers dynamic drug features using a gating mechanism that mimics the co-administration effects by imposing attention to genes. Also, each side-effect is projected into a latent space through translating embedding. As a result, the model achieved an AUC of 0.889 and an AUPR of 0.915 in unseen interaction prediction, which is competitively very accurate and outperforms other state-of-the-art methods. Furthermore, it can predict potential DDIs with new compounds not used in training. In conclusion, using drug-induced gene expression signatures followed by gating and translating embedding can increase DDI prediction accuracy while providing model interpretability. The source code is available on GitHub ( https://github.com/GIST-CSBL/DeSIDE-DDI ).

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA