Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
ISME Commun ; 3(1): 128, 2023 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-38049632

RESUMO

Local microbiome shifts are implicated in the development and progression of gastrointestinal cancers, and in particular, esophageal carcinoma (ESCA), which is among the most aggressive malignancies. Short-read RNA sequencing (RNAseq) is currently the leading technology to study gene expression changes in cancer. However, using RNAseq to study microbial gene expression is challenging. Here, we establish a new tool to efficiently detect viral and bacterial expression in human tissues through RNAseq. This approach employs a neural network to predict reads of likely microbial origin, which are targeted for assembly into longer contigs, improving identification of microbial species and genes. This approach is applied to perform a systematic comparison of bacterial expression in ESCA and healthy esophagi. We uncover bacterial genera that are over or underabundant in ESCA vs healthy esophagi both before and after correction for possible covariates, including patient metadata. However, we find that bacterial taxonomies are not significantly associated with clinical outcomes. Strikingly, in contrast, dozens of microbial proteins were significantly associated with poor patient outcomes and in particular, proteins that perform mitochondrial functions and iron-sulfur coordination. We further demonstrate associations between these microbial proteins and dysregulated host pathways in ESCA patients. Overall, these results suggest possible influences of bacteria on the development of ESCA and uncover new prognostic biomarkers based on microbial genes. In addition, this study provides a framework for the analysis of other human malignancies whose development may be driven by pathogens.

2.
Cancers (Basel) ; 15(7)2023 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-37046619

RESUMO

Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.

3.
Nat Commun ; 14(1): 785, 2023 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-36774364

RESUMO

About 15% of human cancer cases are attributed to viral infections. To date, virus expression in tumor tissues has been mostly studied by aligning tumor RNA sequencing reads to databases of known viruses. To allow identification of divergent viruses and rapid characterization of the tumor virome, we develop viRNAtrap, an alignment-free pipeline to identify viral reads and assemble viral contigs. We utilize viRNAtrap, which is based on a deep learning model trained to discriminate viral RNAseq reads, to explore viral expression in cancers and apply it to 14 cancer types from The Cancer Genome Atlas (TCGA). Using viRNAtrap, we uncover expression of unexpected and divergent viruses that have not previously been implicated in cancer and disclose human endogenous viruses whose expression is associated with poor overall survival. The viRNAtrap pipeline provides a way forward to study viral infections associated with different clinical conditions.


Assuntos
Aprendizado Profundo , Neoplasias , Vírus , Humanos , Neoplasias/genética , Vírus/genética , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala
4.
Nat Commun ; 14(1): 724, 2023 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-36759620

RESUMO

The PML::RARA fusion protein is the hallmark driver of Acute Promyelocytic Leukemia (APL) and disrupts retinoic acid signaling, leading to wide-scale gene expression changes and uncontrolled proliferation of myeloid precursor cells. While known to be recruited to binding sites across the genome, its impact on gene regulation and expression is under-explored. Using integrated multi-omics datasets, we characterize the influence of PML::RARA binding on gene expression and regulation in an inducible PML::RARA cell line model and APL patient ex vivo samples. We find that genes whose regulatory elements recruit PML::RARA are not uniformly transcriptionally repressed, as commonly suggested, but also may be upregulated or remain unchanged. We develop a computational machine learning implementation called Regulatory Element Behavior Extraction Learning to deconvolute the complex, local transcription factor binding site environment at PML::RARA bound positions to reveal distinct signatures that modulate how PML::RARA directs the transcriptional response.


Assuntos
Leucemia Promielocítica Aguda , Humanos , Linhagem Celular , Regulação da Expressão Gênica , Leucemia Promielocítica Aguda/genética , Leucemia Promielocítica Aguda/metabolismo , Multiômica , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo , Tretinoína/farmacologia
5.
Bioinformatics ; 37(17): 2544-2555, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33638345

RESUMO

MOTIVATION: A global effort is underway to identify compounds for the treatment of COVID-19. Since de novo compound design is an extremely long, time-consuming and expensive process, efforts are underway to discover existing compounds that can be repurposed for COVID-19 and new viral diseases.We propose a machine learning representation framework that uses deep learning induced vector embeddings of compounds and viral proteins as features to predict compound-viral protein activity. The prediction model in-turn uses a consensus framework to rank approved compounds against viral proteins of interest. RESULTS: Our consensus framework achieves a high mean Pearson correlation of 0.916, mean R2 of 0.840 and a low mean squared error of 0.313 for the task of compound-viral protein activity prediction on an independent test set. As a use case, we identify a ranked list of 47 compounds common to three main proteins of SARS-COV-2 virus (PL-PRO, 3CL-PRO and Spike protein) as potential targets including 21 antivirals, 15 anticancer, 5 antibiotics and 6 other investigational human compounds. We perform additional molecular docking simulations to demonstrate that majority of these compounds have low binding energies and thus high binding affinity with the potential to be effective against the SARS-COV-2 virus. AVAILABILITY AND IMPLEMENTATION: All the source code and data is available at: https://github.com/raghvendra5688/Drug-Repurposing and https://dx.doi.org/10.17632/8rrwnbcgmx.3. We also implemented a web-server at: https://machinelearning-protein.qcri.org/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Biol Direct ; 16(1): 6, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-33461600

RESUMO

BACKGROUND: Drug-induced liver injury (DILI) is a major safety concern characterized by a complex and diverse pathogenesis. In order to identify DILI early in drug development, a better understanding of the injury and models with better predictivity are urgently needed. One approach in this regard are in silico models which aim at predicting the risk of DILI based on the compound structure. However, these models do not yet show sufficient predictive performance or interpretability to be useful for decision making by themselves, the former partially stemming from the underlying problem of labeling the in vivo DILI risk of compounds in a meaningful way for generating machine learning models. RESULTS: As part of the Critical Assessment of Massive Data Analysis (CAMDA) "CMap Drug Safety Challenge" 2019 ( http://camda2019.bioinf.jku.at ), chemical structure-based models were generated using the binarized DILIrank annotations. Support Vector Machine (SVM) and Random Forest (RF) classifiers showed comparable performance to previously published models with a mean balanced accuracy over models generated using 5-fold LOCO-CV inside a 10-fold training scheme of 0.759 ± 0.027 when predicting an external test set. In the models which used predicted protein targets as compound descriptors, we identified the most information-rich proteins which agreed with the mechanisms of action and toxicity of nonsteroidal anti-inflammatory drugs (NSAIDs), one of the most important drug classes causing DILI, stress response via TP53 and biotransformation. In addition, we identified multiple proteins involved in xenobiotic metabolism which could be novel DILI-related off-targets, such as CLK1 and DYRK2. Moreover, we derived potential structural alerts for DILI with high precision, including furan and hydrazine derivatives; however, all derived alerts were present in approved drugs and were over specific indicating the need to consider quantitative variables such as dose. CONCLUSION: Using chemical structure-based descriptors such as structural fingerprints and predicted protein targets, DILI prediction models were built with a predictive performance comparable to previous literature. In addition, we derived insights on proteins and pathways statistically (and potentially causally) linked to DILI from these models and inferred new structural alerts related to this adverse endpoint.


Assuntos
Doença Hepática Induzida por Substâncias e Drogas , Simulação por Computador , Aprendizado de Máquina , Humanos , Modelos Biológicos
7.
Bioinformatics ; 36(5): 1429-1438, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31603511

RESUMO

MOTIVATION: X-ray crystallography has facilitated the majority of protein structures determined to date. Sequence-based predictors that can accurately estimate protein crystallization propensities would be highly beneficial to overcome the high expenditure, large attrition rate, and to reduce the trial-and-error settings required for crystallization. RESULTS: In this study, we present a novel model, BCrystal, which uses an optimized gradient boosting machine (XGBoost) on sequence, structural and physio-chemical features extracted from the proteins of interest. BCrystal also provides explanations, highlighting the most important features for the predicted crystallization propensity of an individual protein using the SHAP algorithm. On three independent test sets, BCrystal outperforms state-of-the-art sequence-based methods by more than 12.5% in accuracy, 18% in recall and 0.253 in Matthew's correlation coefficient, with an average accuracy of 93.7%, recall of 96.63% and Matthew's correlation coefficient of 0.868. For relative solvent accessibility of exposed residues, we observed higher values to associate positively with protein crystallizability and the number of disordered regions, fraction of coils and tripeptide stretches that contain multiple histidines associate negatively with crystallizability. The higher accuracy of BCrystal enables it to accurately screen for sequence variants with enhanced crystallizability. AVAILABILITY AND IMPLEMENTATION: Our BCrystal webserver is at https://machinelearning-protein.qcri.org/ and source code is available at https://github.com/raghvendra5688/BCrystal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas , Cristalização , Cristalografia por Raios X , Software
8.
Bioinformatics ; 35(13): 2216-2225, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30462171

RESUMO

MOTIVATION: Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, the majority of these methods build their predictors by extracting features from protein sequences, which is computationally expensive and can explode the feature space. We propose DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction-quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on convolutional neural networks, which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to distinguish proteins that will result in diffraction-quality crystals from those that will not. RESULTS: Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and Matthew's correlation coefficient (MCC) on three independent test sets. DeepCrystal achieves an average improvement of 1.4, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf, respectively. In addition, DeepCrystal attains an average improvement of 2.1, 6.0% for F-score, 1.9, 3.9% for accuracy and 3.8, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets. AVAILABILITY AND IMPLEMENTATION: The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Cristalização , Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA