RESUMO
Carcinomas are complex ecosystems composed of cancer, stromal and immune cells. Communication between these cells and their microenvironments induces cancer progression and causes therapy resistance. In order to improve the treatment of cancers, it is essential to quantify crosstalk between and within various cell types in a tumour microenvironment. Focusing on the coordinated expression patterns of ligands and cognate receptors, cell-cell communication can be inferred through ligand-receptor interactions (LRIs). In this manuscript, we carry out the following work: (i) introduce pipeline for ligand-receptor-mediated intercellular communication estimation from single-cell transcriptomics and list a few available LRI-related databases and visualization tools; (ii) demonstrate seven classical intercellular communication scoring strategies, highlight four types of representative intercellular communication inference methods, including network-based approaches, machine learning-based approaches, spatial information-based approaches and other approaches; (iii) summarize the evaluation and validation avenues for intercellular communication inference and analyze the advantages and limitations for the above four types of cell-cell communication methods; (iv) comment several major challenges while provide further research directions for intercellular communication analysis in the tumour microenvironments. We anticipate that this work helps to better understand intercellular crosstalk and to further develop powerful cell-cell communication estimation tools for tumor-targeted therapy.
Assuntos
Neoplasias , Microambiente Tumoral , Comunicação Celular , Ecossistema , Humanos , Ligantes , Neoplasias/metabolismo , TranscriptomaRESUMO
BACKGROUND: Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA-protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. RESULTS: Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA-protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA-protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. CONCLUSIONS: Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.
Assuntos
RNA Longo não Codificante , Biologia Computacional , Árvores de Decisões , Humanos , Plantas , RNA Longo não Codificante/genéticaRESUMO
BACKGROUND: Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. RESULTS: Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. CONCLUSIONS: Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins.
Assuntos
RNA Longo não Codificante , Sequência de Aminoácidos , Biologia Computacional , Redes Neurais de Computação , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/metabolismoRESUMO
Single-cell RNA sequencing (scRNA-seq) technologies allow numerous opportunities for revealing novel and potentially unexpected biological discoveries. scRNA-seq clustering helps elucidate cell-to-cell heterogeneity and uncover cell subgroups and cell dynamics at the group level. Two important aspects of scRNA-seq data analysis were introduced and discussed in the present review: relevant datasets and analytical tools. In particular, we reviewed popular scRNA-seq datasets and discussed scRNA-seq clustering models including K-means clustering, hierarchical clustering, consensus clustering, and so on. Seven state-of-the-art scRNA clustering methods were compared on five public available datasets. Two primary evaluation metrics, the Adjusted Rand Index (ARI) and the Normalized Mutual Information (NMI), were used to evaluate these methods. Although unsupervised models can effectively cluster scRNA-seq data, these methods also have challenges. Some suggestions were provided for future research directions.
Assuntos
Análise por Conglomerados , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Análise de Célula Única , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , NavegadorRESUMO
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Assuntos
RNA Longo não Codificante , Inteligência Artificial , Biologia Computacional/métodos , Aprendizado de Máquina , Redes Neurais de Computação , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismoRESUMO
The identification of lncRNA-protein interactions (LPIs) is important to understand the biological functions and molecular mechanisms of lncRNAs. However, most computational models are evaluated on a unique dataset, thereby resulting in prediction bias. Furthermore, previous models have not uncovered potential proteins (or lncRNAs) interacting with a new lncRNA (or protein). Finally, the performance of these models can be improved. In this study, we develop a Deep Learning framework with Dual-net Neural architecture to find potential LPIs (LPI-DLDN). First, five LPI datasets are collected. Second, the features of lncRNAs and proteins are extracted by Pyfeat and BioTriangle, respectively. Third, these features are concatenated as a vector after dimension reduction. Finally, a deep learning model with dual-net neural architecture is designed to classify lncRNA-protein pairs. LPI-DLDN is compared with six state-of-the-art LPI prediction methods (LPI-XGBoost, LPI-HeteSim, LPI-NRLMF, PLIPCOM, LPI-CNNCP, and Capsule-LPI) under four cross validations. The results demonstrate the powerful LPI classification performance of LPI-DLDN. Case study analyses show that there may be interactions between RP11-439E19.10 and Q15717, and between RP11-196G18.22 and Q9NUL5. The novelty of LPI-DLDN remains, integrating various biological features, designing a novel deep learning-based LPI identification framework, and selecting the optimal LPI feature subset based on feature importance ranking.
Assuntos
Aprendizado Profundo , RNA Longo não Codificante , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Biologia Computacional/métodosRESUMO
Coronavirus disease 2019 (COVID-19) is rapidly spreading. Researchers around the world are dedicated to finding the treatment clues for COVID-19. Drug repositioning, as a rapid and cost-effective way for finding therapeutic options from available FDA-approved drugs, has been applied to drug discovery for COVID-19. In this study, we develop a novel drug repositioning method (VDA-KLMF) to prioritize possible anti-SARS-CoV-2 drugs integrating virus sequences, drug chemical structures, known Virus-Drug Associations, and Logistic Matrix Factorization with Kernel diffusion. First, Gaussian kernels of viruses and drugs are built based on known VDAs and nearest neighbors. Second, sequence similarity kernel of viruses and chemical structure similarity kernel of drugs are constructed based on biological features and an identity matrix. Third, Gaussian kernel and similarity kernel are diffused. Forth, a logistic matrix factorization model with kernel diffusion is proposed to identify potential anti-SARS-CoV-2 drugs. Finally, molecular dockings between the inferred antiviral drugs and the junction of SARS-CoV-2 spike protein-ACE2 interface are implemented to investigate the binding abilities between them. VDA-KLMF is compared with two state-of-the-art VDA prediction models (VDA-KATZ and VDA-RWR) and three classical association prediction methods (NGRHMDA, LRLSHMDA, and NRLMF) based on 5-fold cross validations on viruses, drugs, and VDAs on three datasets. It obtains the best recalls, AUCs, and AUPRs, significantly outperforming other five methods under the three different cross validations. We observe that four chemical agents coming together on any two datasets, that is, remdesivir, ribavirin, nitazoxanide, and emetine, may be the clues of treatment for COVID-19. The docking results suggest that the key residues K353 and G496 may affect the binding energies and dynamics between the inferred anti-SARS-CoV-2 chemical agents and the junction of the spike protein-ACE2 interface. Integrating various biological data, Gaussian kernel, similarity kernel, and logistic matrix factorization with kernel diffusion, this work demonstrates that a few chemical agents may assist in drug discovery for COVID-19.
RESUMO
Long noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA-protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.
Assuntos
Aprendizado de Máquina , RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/metabolismo , Arabidopsis , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Humanos , Zea maysRESUMO
The outbreak of a novel febrile respiratory disease called COVID-19, caused by a newfound coronavirus SARS-CoV-2, has brought a worldwide attention. Prioritizing approved drugs is critical for quick clinical trials against COVID-19. In this study, we first manually curated three Virus-Drug Association (VDA) datasets. By incorporating VDAs with the similarity between drugs and that between viruses, we constructed a heterogeneous Virus-Drug network. A novel Random Walk with Restart method (VDA-RWR) was then developed to identify possible VDAs related to SARS-CoV-2. We compared VDA-RWR with three state-of-the-art association prediction models based on fivefold cross-validations (CVs) on viruses, drugs and virus-drug associations on three datasets. VDA-RWR obtained the best AUCs for the three fivefold CVs, significantly outperforming other methods. We found two small molecules coming together on the three datasets, that is, remdesivir and ribavirin. These two chemical agents have higher molecular binding energies of - 7.0 kcal/mol and - 6.59 kcal/mol with the domain bound structure of the human receptor angiotensin converting enzyme 2 (ACE2) and the SARS-CoV-2 spike protein, respectively. Interestingly, for the first time, experimental results suggested that navitoclax could be potentially applied to stop SARS-CoV-2 and remains to further validation.
Assuntos
Monofosfato de Adenosina/análogos & derivados , Alanina/análogos & derivados , Enzima de Conversão de Angiotensina 2/química , Antivirais/química , Ribavirina/química , Glicoproteína da Espícula de Coronavírus/química , Monofosfato de Adenosina/química , Alanina/química , Compostos de Anilina/química , Avaliação Pré-Clínica de Medicamentos , Genoma Viral , Simulação de Acoplamento Molecular , SARS-CoV-2/genética , Sulfonamidas/químicaRESUMO
A new coronavirus called SARS-CoV-2 is rapidly spreading around the world. Over 16,558,289 infected cases with 656,093 deaths have been reported by July 29th, 2020, and it is urgent to identify effective antiviral treatment. In this study, potential antiviral drugs against SARS-CoV-2 were identified by drug repositioning through Virus-Drug Association (VDA) prediction. 96 VDAs between 11 types of viruses similar to SARS-CoV-2 and 78 small molecular drugs were extracted and a novel VDA identification model (VDA-RLSBN) was developed to find potential VDAs related to SARS-CoV-2. The model integrated the complete genome sequences of the viruses, the chemical structures of drugs, a regularized least squared classifier (RLS), a bipartite local model, and the neighbor association information. Compared with five state-of-the-art association prediction methods, VDA-RLSBN obtained the best AUC of 0.9085 and AUPR of 0.6630. Ribavirin was predicted to be the best small molecular drug, with a higher molecular binding energy of -6.39 kcal/mol with human angiotensin-converting enzyme 2 (ACE2), followed by remdesivir (-7.4 kcal/mol), mycophenolic acid (-5.35 kcal/mol), and chloroquine (-6.29 kcal/mol). Ribavirin, remdesivir, and chloroquine have been under clinical trials or supported by recent works. In addition, for the first time, our results suggested several antiviral drugs, such as FK506, with molecular binding energies of -11.06 and -10.1 kcal/mol with ACE2 and the spike protein, respectively, could be potentially used to prevent SARS-CoV-2 and remains to further validation. Drug repositioning through virus-drug association prediction can effectively find potential antiviral drugs against SARS-CoV-2.