RESUMO
Cervical cancer (CC) is one of the world's most common and severe cancers. This cancer includes two histological types: squamous cell carcinoma (SCC) and adenocarcinoma (ADC). The current study aims at identifying novel potential candidate mRNA and miRNA biomarkers for SCC based on a protein-protein interaction (PPI) and miRNA-mRNA network analysis. The current project utilized a transcriptome profile for normal and SCC samples. First, the PPI network was constructed for the 1335 DEGs, and then, a significant gene module was extracted from the PPI network. Next, a list of miRNAs targeting module's genes was collected from the experimentally validated databases, and a miRNA-mRNA regulatory network was formed. After network analysis, four driver genes were selected from the module's genes including MCM2, MCM10, POLA1, and TONSL and introduced as potential candidate biomarkers for SCC. In addition, two hub miRNAs, including miR-193b-3p and miR-615-3p, were selected from the miRNA-mRNA regulatory network and reported as possible candidate biomarkers. In summary, six potential candidate RNA-based biomarkers consist of four genes containing MCM2, MCM10, POLA1, and TONSL, and two miRNAs containing miR-193b-3p and miR-615-3p are opposed as potential candidate biomarkers for CC.
Assuntos
MicroRNAs , Neoplasias do Colo do Útero , Feminino , Humanos , Neoplasias do Colo do Útero/diagnóstico , Neoplasias do Colo do Útero/genética , Mapas de Interação de Proteínas/genética , Biomarcadores , MicroRNAs/genética , RNA Mensageiro/genética , NF-kappa BRESUMO
Aptamers can be regarded as efficient substitutes for monoclonal antibodies in many diagnostic and therapeutic applications. Due to the tedious and prohibitive nature of SELEX (systematic evolution of ligands by exponential enrichment), the in silico methods have been developed to improve the enrichment processes rate. However, the majority of these methods did not show any effort in designing novel aptamers. Moreover, some target proteins may have not any binding RNA candidates in nature and a reductive mechanism is needed to generate novel aptamer pools among enormous possible combinations of nucleotide acids to be examined in vitro. We have applied a genetic algorithm (GA) with an embedded binding predictor fitness function to in silico design of RNA aptamers. As a case study of this research, all steps were accomplished to generate an aptamer pool against aminopeptidase N (CD13) biomarker. First, the model was developed based on sequential and structural features of known RNA-protein complexes. Then, utilizing RNA sequences involved in complexes with positive prediction results, as the first-generation, novel aptamers were designed and top-ranked sequences were selected. A 76-mer aptamer was identified with the highest fitness value with a 3 to 6 time higher score than parent oligonucleotides. The reliability of obtained sequences was confirmed utilizing docking and molecular dynamic simulation. The proposed method provides an important simplified contribution to the oligonucleotide-aptamer design process. Also, it can be an underlying ground to design novel aptamers against a wide range of biomarkers.
Assuntos
Algoritmos , Aptâmeros de Nucleotídeos/química , Desenho de Fármacos/métodos , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Aptâmeros de Nucleotídeos/genética , Biomarcadores , Antígenos CD13/química , Antígenos CD13/metabolismo , Ligantes , Conformação Molecular , Proteínas/química , Proteínas/genética , RNA/química , RNA/genética , RNA/metabolismoRESUMO
Feature extraction is one of the most important preprocessing steps in predicting the interactions between RNAs and proteins by applying machine learning approaches. Despite many efforts in this area, still, no suitable structural feature extraction tool has been designed. Therefore, an online toolbox, named RPINBASE which can be applied to different scopes of biological applications, is introduced in this paper. This toolbox employs efficient nested queries that enhance the speed of the requests and produces desired features in the form of positive and negative samples. To show the capabilities of the proposed toolbox, the developed toolbox was investigated in the aptamer design problem, and the obtained results are discussed. RPINBASE is an online toolbox and is accessible at http://rpinbase.com.
Assuntos
Proteínas de Ligação a RNA/química , RNA/química , Software , Bases de Dados de Proteínas , Internet , Aprendizado de Máquina , Conformação de Ácido Nucleico , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismoRESUMO
The performance of a model in machine learning problems highly depends on the dataset and training algorithms. Choosing the right training algorithm can change the tale of a model. While some algorithms have a great performance in some datasets, they may fall into trouble in other datasets. Moreover, by adjusting hyperparameters of an algorithm, which controls the training processes, the performance can be improved. This study contributes a method to tune hyperparameters of machine learning algorithms using Grey Wolf Optimization (GWO) and Genetic algorithm (GA) metaheuristics. Also, 11 different algorithms including Averaged Perceptron, FastTree, FastForest, Light Gradient Boost Machine (LGBM), Limited memory Broyden Fletcher Goldfarb Shanno algorithm Maximum Entropy (LbfgsMxEnt), Linear Support Vector Machine (LinearSVM), and a Deep Neural Network (DNN) including four architectures are employed on 11 datasets in different biological, biomedical, and nature categories such as molecular interactions, cancer, clinical diagnosis, behavior related predictions, RGB images of human skin, and X-rays images of Covid19 and cardiomegaly patients. Our results show that in all trials, the performance of the training phases is improved. Also, GWO demonstrates a better performance with a p-value of 2.6E-5. Moreover, in most experiment cases of this study, the metaheuristic methods demonstrate better performance and faster convergence than Exhaustive Grid Search (EGS). The proposed method just receives a dataset as an input and suggests the best-explored algorithm with related arguments. So, it is appropriate for datasets with unknown distribution, machine learning algorithms with complex behavior, or users who are not experts in analytical statistics and data science algorithms.
Assuntos
COVID-19 , Biologia Computacional , Algoritmos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , SARS-CoV-2RESUMO
BACKGROUND: Aberrant levels of 5-hydroxymethylcytosine (5-hmC) can lead to cancer progression. Identification of 5-hmC-related biological pathways in cancer studies can produce better understanding of gastrointestinal (GI) cancers. We conducted a network-based analysis on 5-hmC levels extracted from circulating free DNAs (cfDNA) in GI cancers including colon, gastric, and pancreatic cancers, and from healthy donors. The co-5-hmC network was reconstructed using the weighted-gene co-expression network method. The cancer-related modules/subnetworks were detected. Preservation of three detected 5-hmC-related modules was assessed in an external dataset. The 5-hmC-related modules were functionally enriched, and biological pathways were identified. The relationship between modules was assessed using the Pearson correlation coefficient (p-value < 0.05). An elastic network classifier was used to assess the potential of the 5-hmC modules in distinguishing cancer patients from healthy individuals. To assess the efficiency of the model, the Area Under the Curve (AUC) was computed using five-fold cross-validation in an external dataset. RESULTS: The main biological pathways were the cell cycle, apoptosis, and extracellular matrix (ECM) organization. Direct association between the cell cycle and apoptosis, inverse association between apoptosis and ECM organization, and inverse association between the cell cycle and ECM organization were detected for the 5-hmC modules in GI cancers. An AUC of 92% (0.73-1.00) was observed for the predictive model including 11 genes. CONCLUSION: The intricate association between biological pathways of identified modules may reveal the hidden significance of 5-hmC in GI cancers. The identified predictive model and new biomarkers may be beneficial in cancer detection and precision medicine using liquid biopsy in the early stages.
Assuntos
Ácidos Nucleicos Livres , Neoplasias Gastrointestinais , Apoptose/genética , Ciclo Celular/genética , Ácidos Nucleicos Livres/genética , Matriz Extracelular/genética , Neoplasias Gastrointestinais/genética , HumanosRESUMO
Lung cancer is the most common cancer in men and women. This cancer is divided into two main types, namely non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). Around 85 to 90 percent of lung cancers are NSCLC. Repositioning potent candidate drugs in NSCLC treatment is one of the important topics in cancer studies. Drug repositioning (DR) or drug repurposing is a method for identifying new therapeutic uses of existing drugs. The current study applies a computational drug repositioning method to identify candidate drugs to treat NSCLC patients. To this end, at first, the transcriptomics profile of NSCLC and healthy (control) samples was obtained from the GEO database with the accession number GSE21933. Then, the gene co-expression network was reconstructed for NSCLC samples using the WGCNA, and two significant purple and magenta gene modules were extracted. Next, a list of transcription factor genes that regulate purple and magenta modules' genes was extracted from the TRRUST V2.0 online database, and the TF-TG (transcription factors-target genes) network was drawn. Afterward, a list of drugs targeting TF-TG genes was obtained from the DGIdb V4.0 database, and two drug-gene interaction networks, including drug-TG and drug-TF, were drawn. After analyzing gene co-expression TF-TG, and drug-gene interaction networks, 16 drugs were selected as potent candidates for NSCLC treatment. Out of 16 selected drugs, nine drugs, namely Methotrexate, Olanzapine, Haloperidol, Fluorouracil, Nifedipine, Paclitaxel, Verapamil, Dexamethasone, and Docetaxel, were chosen from the drug-TG sub-network. In addition, nine drugs, including Cisplatin, Daunorubicin, Dexamethasone, Methotrexate, Hydrocortisone, Doxorubicin, Azacitidine, Vorinostat, and Doxorubicin Hydrochloride, were selected from the drug-TF sub-network. Methotrexate and Dexamethasone are common in drug-TG and drug-TF sub-networks. In conclusion, this study proposed 16 drugs as potent candidates for NSCLC treatment through analyzing gene co-expression, TF-TG, and drug-gene interaction networks.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Dexametasona , Doxorrubicina , Reposicionamento de Medicamentos , Feminino , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Metotrexato , Corantes de RosanilinaRESUMO
RNA-protein interactions of a virus play a major role in the replication of RNA viruses. The replication and transcription of these viruses take place in the cytoplasm of the host cell; hence, there is a probability for the host RNA-viral protein and viral RNA-host protein interactions. The current study applies a high-throughput computational approach, including feature extraction and machine learning methods, to predict the affinity of protein sequences of ten viruses to three categories of RNA sequences. These categories include RNAs involved in the protein-RNA complexes stored in the RCSB database, the human miRNAs deposited at the mirBase database, and the lncRNA deposited in the LNCipedia database. The results show that evolution not only tries to conserve key viral proteins involved in the replication and transcription but also prunes their interaction capability. These proteins with specific interactions do not perturb the host cell through undesired interactions. On the other hand, the hypermutation rate of NSP3 is related to its affinity to host cell RNAs. The Gene Ontology (GO) analysis of the miRNA with affiliation to NSP3 suggests that these miRNAs show strongly significantly enriched GO terms related to the known symptoms of COVID-19. Docking and MD simulation study of the obtained miRNA through high-throughput analysis suggest a non-coding RNA (an RNA antitoxin, ToxI) as a natural aptamer drug candidate for NSP5 inhibition. Finally, a significant interplay of the host RNA-viral protein in the host cell can disrupt the host cell's system by influencing the RNA-dependent processes of the host cells, such as a differential expression in RNA. Furthermore, our results are useful to identify the side effects of mRNA-based vaccines, many of which are caused by the off-label interactions with the human lncRNAs.