Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Más filtros

Bases de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Bioinformatics ; 40(1)2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38175787

RESUMEN

MOTIVATION: Understanding metal-protein interaction can provide structural and functional insights into cellular processes. As the number of protein sequences increases, developing fast yet precise computational approaches to predict and annotate metal-binding sites becomes imperative. Quick and resource-efficient pre-trained protein language model (pLM) embeddings have successfully predicted binding sites from protein sequences despite not using structural or evolutionary features (multiple sequence alignments). Using residue-level embeddings from the pLMs, we have developed a sequence-based method (M-Ionic) to identify metal-binding proteins and predict residues involved in metal binding. RESULTS: On independent validation of recent proteins, M-Ionic reports an area under the curve (AUROC) of 0.83 (recall = 84.6%) in distinguishing metal binding from non-binding proteins compared to AUROC of 0.74 (recall = 61.8%) of the next best method. In addition to comparable performance to the state-of-the-art method for identifying metal-binding residues (Ca2+, Mg2+, Mn2+, Zn2+), M-Ionic provides binding probabilities for six additional ions (i.e. Cu2+, Po43-, So42-, Fe2+, Fe3+, Co2+). We show that the pLM embedding of a single residue contains sufficient information about its neighbours to predict its binding properties. AVAILABILITY AND IMPLEMENTATION: M-Ionic can be used on your protein of interest using a Google Colab Notebook (https://bit.ly/40FrRbK). The GitHub repository (https://github.com/TeamSundar/m-ionic) contains all code and data.


Asunto(s)
Metales , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Sitios de Unión , Iones , Dominios Proteicos , Metales/química , Metales/metabolismo
2.
BMC Genomics ; 22(1): 214, 2021 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-33761889

RESUMEN

BACKGROUND: Survival and drug response are two highly emphasized clinical outcomes in cancer research that directs the prognosis of a cancer patient. Here, we have proposed a late multi omics integrative framework that robustly quantifies survival and drug response for breast cancer patients with a focus on the relative predictive ability of available omics datatypes. Neighborhood component analysis (NCA), a supervised feature selection algorithm selected relevant features from multi-omics datasets retrieved from The Cancer Genome Atlas (TCGA) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. A Neural network framework, fed with NCA selected features, was used to develop survival and drug response prediction models for breast cancer patients. The drug response framework used regression and unsupervised clustering (K-means) to segregate samples into responders and non-responders based on their predicted IC50 values (Z-score). RESULTS: The survival prediction framework was highly effective in categorizing patients into risk subtypes with an accuracy of 94%. Compared to single-omics and early integration approaches, our drug response prediction models performed significantly better and were able to predict IC50 values (Z-score) with a mean square error (MSE) of 1.154 and an overall regression value of 0.92, showing a linear relationship between predicted and actual IC50 values. CONCLUSION: The proposed omics integration strategy provides an effective way of extracting critical information from diverse omics data types enabling estimation of prognostic indicators. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.


Asunto(s)
Neoplasias de la Mama , Aprendizaje Profundo , Preparaciones Farmacéuticas , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Genómica , Humanos , Medicina de Precisión
3.
ACS Omega ; 9(28): 30645-30653, 2024 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-39035912

RESUMEN

Cancer is a lethal disease that affects numerous people worldwide. Chemotherapy stands as one of the most effective treatment regimens to combat cancer. Nevertheless, anticancer drugs face a high failure rate due to safety and efficacy issues. Drug failure could be subdued by instigating drug leads with reduced toxicity and enhanced efficacy. Computer-aided drug discovery endorses drug leads in manoeuvring protein and ligand structures or representations. Simplified molecular input line entry system (SMILES) is a linear notation representing the three-dimensional structure of a molecule using symbols and alphanumeric characters. SMILES representation hoards rings and scaffold structures in its depiction. Mining ring and scaffold patterns from molecular SMILES would assist in ascertaining biological properties based on molecular patterns. Moreover, the emergence of artificial intelligence (AI) technologies would accelerate identification of efficient anticancer drug leads. AI algorithms proclaimed for their pattern recognition ability could be employed for identifying molecular patterns from SMILES representation, thereby enabling property prediction. Consequently, we developed a multilayer perceptron (MLP) model for the prediction of anticancer activity using SMILES of NCI-60 cancer growth inhibition data. Furthermore, the top 8 frequent scaffolds were identified on preliminary analysis of cancer growth inhibition data and ChEMBL drugs. The developed MLP model classified anticancer and nonanticancer compounds with a classification accuracy of 0.92. Also, benchmarking of the developed model with machine learning algorithms exhibited better performance of the MLP model.

4.
Methods Mol Biol ; 2553: 285-323, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36227550

RESUMEN

Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.


Asunto(s)
Aprendizaje Profundo , Ácidos Nucleicos , Biología Computacional/métodos , Aprendizaje Automático , Ácidos Nucleicos/metabolismo , Mapas de Interacción de Proteínas , Proteínas/metabolismo
5.
J Mol Biol ; 435(13): 168121, 2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37100167

RESUMEN

Transcription factors (TF) recognize specific motifs in the genome that are typically 6-12 bp long to regulate various aspects of the cellular machinery. Presence of binding motifs and favorable genome accessibility are key drivers for a consistent TF-DNA interaction. Although these pre-requisites may occur thousands of times in the genome, there seems to be a high degree of selectivity for the sites that are actually bound. Here, we present a deep-learning framework that identifies and characterizes the upstream and downstream genetic elements to the binding motif, for their role in enforcing the mentioned selectivity. The proposed framework is based on an interpretable recurrent neural network architecture that enables for the relative analysis of sequence context features. We apply the framework to model twenty-six transcription factors and score the TF-DNA binding at a base-pair resolution. We find significant differences in activations of DNA context features for bound and unbound sequences. In addition to standardized evaluation protocols, we offer outstanding interpretability that enables us to identify and annotate DNA sequence with possible elements that modulate TF-DNA binding. Also, differences in data processing have a huge influence on the overall model performance. Overall, the proposed framework allows for novel insights on the non-coding genetic elements and their role in facilitating a stable TF-DNA interaction.


Asunto(s)
ADN , Aprendizaje Profundo , Factores de Transcripción , Sitios de Unión/genética , ADN/metabolismo , Unión Proteica , Factores de Transcripción/metabolismo
6.
ACS Omega ; 7(14): 12138-12146, 2022 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-35449922

RESUMEN

In silico methods to identify novel drug-target interactions (DTIs) have gained significant importance over conventional techniques owing to their labor-intensive and low-throughput nature. Here, we present a machine learning-based multiclass classification workflow that segregates interactions between active, inactive, and intermediate drug-target pairs. Drug molecules, protein sequences, and molecular descriptors were transformed into machine-interpretable embeddings to extract critical features from standard datasets. Tools such as CHEMBL web resource, iFeature, and an in-house developed deep neural network-assisted drug recommendation (dNNDR)-featx were employed for data retrieval and processing. The models were trained with large-scale DTI datasets, which reported an improvement in performance over baseline methods. External validation results showed that models based on att-biLSTM and gCNN could help predict novel DTIs. When tested with a completely different dataset, the proposed models significantly outperformed competing methods. The validity of novel interactions predicted by dNNDR was backed by experimental and computational evidence in the literature. The proposed methodology could elucidate critical features that govern the relationship between a drug and its target.

7.
Microbiol Resour Announc ; 11(8): e0041922, 2022 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-35862912

RESUMEN

Here, we report the whole-genome sequence of Franconibacter sp. strain IITDAS19, a potent biosurfactant-producing bacterium that was isolated from oil-contaminated soil. The sequence provided information on the genes and enzymes responsible for the biosynthesis of the biosurfactant.

8.
ACS Omega ; 7(3): 2706-2717, 2022 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-35097268

RESUMEN

The identification of novel drug-target interactions is a labor-intensive and low-throughput process. In silico alternatives have proved to be of immense importance in assisting the drug discovery process. Here, we present TransDTI, a multiclass classification and regression workflow employing transformer-based language models to segregate interactions between drug-target pairs as active, inactive, and intermediate. The models were trained with large-scale drug-target interaction (DTI) data sets, which reported an improvement in performance in terms of the area under receiver operating characteristic (auROC), the area under precision recall (auPR), Matthew's correlation coefficient (MCC), and R2 over baseline methods. The results showed that models based on transformer-based language models effectively predict novel drug-target interactions from sequence data. The proposed models significantly outperformed existing methods like DeepConvDTI, DeepDTA, and DeepDTI on a test data set. Further, the validity of novel interactions predicted by TransDTI was found to be backed by molecular docking and simulation analysis, where the model prediction had similar or better interaction potential for MAP2k and transforming growth factor-ß (TGFß) and their known inhibitors. Proposed approaches can have a significant impact on the development of personalized therapy and clinical decision making.

9.
Cancers (Basel) ; 13(13)2021 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-34206288

RESUMEN

The utility of multi-omics in personalized therapy and cancer survival analysis has been debated and demonstrated extensively in the recent past. Most of the current methods still suffer from data constraints such as high-dimensionality, unexplained interdependence, and subpar integration methods. Here, we propose SurvCNN, an alternative approach to process multi-omics data with robust computer vision architectures, to predict cancer prognosis for Lung Adenocarcinoma patients. Numerical multi-omics data were transformed into their image representations and fed into a Convolutional Neural network with a discrete-time model to predict survival probabilities. The framework also dichotomized patients into risk subgroups based on their survival probabilities over time. SurvCNN was evaluated on multiple performance metrics and outperformed existing methods with a high degree of confidence. Moreover, comprehensive insights into the relative performance of various combinations of omics datasets were probed. Critical biological processes, pathways and cell types identified from downstream processing of differentially expressed genes suggested that the framework could elucidate elements detrimental to a patient's survival. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA