Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 102
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37185897

ABSTRACT

Single-cell RNA-seq analysis has become a powerful tool to analyse the transcriptomes of individual cells. In turn, it has fostered the possibility of screening thousands of single cells in parallel. Thus, contrary to the traditional bulk measurements that only paint a macroscopic picture, gene measurements at the cell level aid researchers in studying different tissues and organs at various stages. However, accurate clustering methods for such high-dimensional data remain exiguous and a persistent challenge in this domain. Of late, several methods and techniques have been promulgated to address this issue. In this article, we propose a novel framework for clustering large-scale single-cell data and subsequently identifying the rare-cell sub-populations. To handle such sparse, high-dimensional data, we leverage PaCMAP (Pairwise Controlled Manifold Approximation), a feature extraction algorithm that preserves both the local and the global structures of the data and Gaussian Mixture Model to cluster single-cell data. Subsequently, we exploit Edited Nearest Neighbours sampling and Isolation Forest/One-class Support Vector Machine to identify rare-cell sub-populations. The performance of the proposed method is validated using the publicly available datasets with varying degrees of cell types and rare-cell sub-populations. On several benchmark datasets, the proposed method outperforms the existing state-of-the-art methods. The proposed method successfully identifies cell types that constitute populations ranging from 0.1 to 8% with F1-scores of 0.91 0.09. The source code is available at https://github.com/scrab017/RarPG.


Subject(s)
Single-Cell Gene Expression Analysis , Unsupervised Machine Learning , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Cluster Analysis , Gene Expression Profiling/methods
2.
Mol Syst Biol ; 20(5): 549-572, 2024 May.
Article in English | MEDLINE | ID: mdl-38499674

ABSTRACT

Biological systems can gain complexity over time. While some of these transitions are likely driven by natural selection, the extent to which they occur without providing an adaptive benefit is unknown. At the molecular level, one example is heteromeric complexes replacing homomeric ones following gene duplication. Here, we build a biophysical model and simulate the evolution of homodimers and heterodimers following gene duplication using distributions of mutational effects inferred from available protein structures. We keep the specific activity of each dimer identical, so their concentrations drift neutrally without new functions. We show that for more than 60% of tested dimer structures, the relative concentration of the heteromer increases over time due to mutational biases that favor the heterodimer. However, allowing mutational effects on synthesis rates and differences in the specific activity of homo- and heterodimers can limit or reverse the observed bias toward heterodimers. Our results show that the accumulation of more complex protein quaternary structures is likely under neutral evolution, and that natural selection would be needed to reverse this tendency.


Subject(s)
Evolution, Molecular , Gene Duplication , Mutation , Protein Interaction Maps , Selection, Genetic , Protein Interaction Maps/genetics , Protein Multimerization , Models, Genetic , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Computer Simulation
3.
BMC Genomics ; 25(1): 756, 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39095710

ABSTRACT

BACKGROUND: Long non-coding RNAs (lncRNAs) are RNA transcripts of more than 200 nucleotides that do not encode canonical proteins. Their biological structure is similar to messenger RNAs (mRNAs). To distinguish between lncRNA and mRNA transcripts quickly and accurately, we upgraded the PLEK alignment-free tool to its next version, PLEKv2, and constructed models tailored for both animals and plants. RESULTS: PLEKv2 can achieve 98.7% prediction accuracy for human datasets. Compared with classical tools and deep learning-based models, this is 8.1%, 3.7%, 16.6%, 1.4%, 4.9%, and 48.9% higher than CPC2, CNCI, Wen et al.'s CNN, LncADeep, PLEK, and NcResNet, respectively. The accuracy of PLEKv2 was > 90% for cross-species prediction. PLEKv2 is more effective and robust than CPC2, CNCI, LncADeep, PLEK, and NcResNet for primate datasets (including chimpanzees, macaques, and gorillas). Moreover, PLEKv2 is not only suitable for non-human primates that are closely related to humans, but can also predict the coding ability of RNA sequences in plants such as Arabidopsis. CONCLUSIONS: The experimental results illustrate that the model constructed by PLEKv2 can distinguish lncRNAs and mRNAs better than PLEK. The PLEKv2 software is freely available at https://sourceforge.net/projects/plek2/ .


Subject(s)
RNA, Long Noncoding , RNA, Messenger , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , Humans , Animals , Software , Computational Biology/methods
4.
BMC Med Imaging ; 24(1): 156, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38910241

ABSTRACT

Parkinson's disease (PD) is challenging for clinicians to accurately diagnose in the early stages. Quantitative measures of brain health can be obtained safely and non-invasively using medical imaging techniques like magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). For accurate diagnosis of PD, powerful machine learning and deep learning models as well as the effectiveness of medical imaging tools for assessing neurological health are required. This study proposes four deep learning models with a hybrid model for the early detection of PD. For the simulation study, two standard datasets are chosen. Further to improve the performance of the models, grey wolf optimization (GWO) is used to automatically fine-tune the hyperparameters of the models. The GWO-VGG16, GWO-DenseNet, GWO-DenseNet + LSTM, GWO-InceptionV3 and GWO-VGG16 + InceptionV3 are applied to the T1,T2-weighted and SPECT DaTscan datasets. All the models performed well and obtained near or above 99% accuracy. The highest accuracy of 99.94% and AUC of 99.99% is achieved by the hybrid model (GWO-VGG16 + InceptionV3) for T1,T2-weighted dataset and 100% accuracy and 99.92% AUC is recorded for GWO-VGG16 + InceptionV3 models using SPECT DaTscan dataset.


Subject(s)
Algorithms , Deep Learning , Magnetic Resonance Imaging , Parkinson Disease , Tomography, Emission-Computed, Single-Photon , Humans , Parkinson Disease/diagnostic imaging , Tomography, Emission-Computed, Single-Photon/methods , Magnetic Resonance Imaging/methods , Male , Female
5.
BMC Med Imaging ; 24(1): 120, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38789925

ABSTRACT

BACKGROUND: Lung cancer is the second most common cancer worldwide, with over two million new cases per year. Early identification would allow healthcare practitioners to handle it more effectively. The advancement of computer-aided detection systems significantly impacted clinical analysis and decision-making on human disease. Towards this, machine learning and deep learning techniques are successfully being applied. Due to several advantages, transfer learning has become popular for disease detection based on image data. METHODS: In this work, we build a novel transfer learning model (VER-Net) by stacking three different transfer learning models to detect lung cancer using lung CT scan images. The model is trained to map the CT scan images with four lung cancer classes. Various measures, such as image preprocessing, data augmentation, and hyperparameter tuning, are taken to improve the efficacy of VER-Net. All the models are trained and evaluated using multiclass classifications chest CT images. RESULTS: The experimental results confirm that VER-Net outperformed the other eight transfer learning models compared with. VER-Net scored 91%, 92%, 91%, and 91.3% when tested for accuracy, precision, recall, and F1-score, respectively. Compared to the state-of-the-art, VER-Net has better accuracy. CONCLUSION: VER-Net is not only effectively used for lung cancer detection but may also be useful for other diseases for which CT scan images are available.


Subject(s)
Lung Neoplasms , Tomography, X-Ray Computed , Humans , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Machine Learning , Deep Learning , Radiographic Image Interpretation, Computer-Assisted/methods
6.
Proc Natl Acad Sci U S A ; 118(21)2021 05 25.
Article in English | MEDLINE | ID: mdl-34001607

ABSTRACT

Across the Tree of Life (ToL), the complexity of proteomes varies widely. Our systematic analysis depicts that from the simplest archaea to mammals, the total number of proteins per proteome expanded ∼200-fold. Individual proteins also became larger, and multidomain proteins expanded ∼50-fold. Apart from duplication and divergence of existing proteins, completely new proteins were born. Along the ToL, the number of different folds expanded ∼5-fold and fold combinations ∼20-fold. Proteins prone to misfolding and aggregation, such as repeat and beta-rich proteins, proliferated ∼600-fold and, accordingly, proteins predicted as aggregation-prone became 6-fold more frequent in mammalian compared with bacterial proteomes. To control the quality of these expanding proteomes, core chaperones, ranging from heat shock proteins 20 (HSP20s) that prevent aggregation to HSP60, HSP70, HSP90, and HSP100 acting as adenosine triphosphate (ATP)-fueled unfolding and refolding machines, also evolved. However, these core chaperones were already available in prokaryotes, and they comprise ∼0.3% of all genes from archaea to mammals. This challenge-roughly the same number of core chaperones supporting a massive expansion of proteomes-was met by 1) elevation of messenger RNA (mRNA) and protein abundances of the ancient generalist core chaperones in the cell, and 2) continuous emergence of new substrate-binding and nucleotide-exchange factor cochaperones that function cooperatively with core chaperones as a network.


Subject(s)
Evolution, Molecular , HSP70 Heat-Shock Proteins/genetics , Protein Aggregates/genetics , Proteome/genetics , Adenosine Triphosphate/metabolism , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Fungi/genetics , Fungi/metabolism , Gene Expression , Gene Ontology , HSP70 Heat-Shock Proteins/metabolism , Mammals , Molecular Sequence Annotation , Phylogeny , Plants/genetics , Plants/metabolism , Protein Folding , Protein Isoforms/genetics , Protein Isoforms/metabolism , Proteome/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism
7.
BMC Oral Health ; 24(1): 715, 2024 Jun 21.
Article in English | MEDLINE | ID: mdl-38907185

ABSTRACT

BACKGROUND: Dental pathogens play a crucial role in oral health issues, including tooth decay, gum disease, and oral infections, and recent research suggests a link between these pathogens and oral cancer initiation and progression. Innovative therapeutic approaches are needed due to antibiotic resistance concerns and treatment limitations. METHODS: We synthesized and analyzed piperine-coated zinc oxide nanoparticles (ZnO-PIP NPs) using UV spectroscopy, SEM, XRD, FTIR, and EDAX. Antioxidant and antimicrobial effectiveness were evaluated through DPPH, ABTS, and MIC assays, while the anticancer properties were assessed on KB oral squamous carcinoma cells. RESULTS: ZnO-PIP NPs exhibited significant antioxidant activity and a MIC of 50 µg/mL against dental pathogens, indicating strong antimicrobial properties. Interaction analysis revealed high binding affinity with dental pathogens. ZnO-PIP NPs showed dose-dependent anticancer activity on KB cells, upregulating apoptotic genes BCL2, BAX, and P53. CONCLUSIONS: This approach offers a multifaceted solution to combatting both oral infections and cancer, showcasing their potential for significant advancement in oral healthcare. It is essential to acknowledge potential limitations and challenges associated with the use of ZnO NPs in clinical applications. These may include concerns regarding nanoparticle toxicity, biocompatibility, and long-term safety. Further research and rigorous testing are warranted to address these issues and ensure the safe and effective translation of ZnO-PIP NPs into clinical practice.


Subject(s)
Alkaloids , Apoptosis , Benzodioxoles , Biofilms , Mouth Neoplasms , Piperidines , Polyunsaturated Alkamides , Zinc Oxide , bcl-2-Associated X Protein , Humans , Alkaloids/pharmacology , Antineoplastic Agents/pharmacology , Antioxidants/pharmacology , Apoptosis/drug effects , bcl-2-Associated X Protein/metabolism , bcl-2-Associated X Protein/drug effects , Benzodioxoles/pharmacology , Biofilms/drug effects , Cell Line, Tumor , KB Cells , Metal Nanoparticles/therapeutic use , Microbial Sensitivity Tests , Microscopy, Electron, Scanning , Mouth Neoplasms/drug therapy , Mouth Neoplasms/pathology , Nanoparticles , Piperidines/pharmacology , Polyunsaturated Alkamides/pharmacology , Proto-Oncogene Proteins c-bcl-2/metabolism , Tumor Suppressor Protein p53/metabolism , Tumor Suppressor Protein p53/drug effects , X-Ray Diffraction , Zinc Oxide/pharmacology
8.
BMC Bioinformatics ; 24(1): 382, 2023 Oct 10.
Article in English | MEDLINE | ID: mdl-37817066

ABSTRACT

An abnormal growth or fatty mass of cells in the brain is called a tumor. They can be either healthy (normal) or become cancerous, depending on the structure of their cells. This can result in increased pressure within the cranium, potentially causing damage to the brain or even death. As a result, diagnostic procedures such as computed tomography, magnetic resonance imaging, and positron emission tomography, as well as blood and urine tests, are used to identify brain tumors. However, these methods can be labor-intensive and sometimes yield inaccurate results. Instead of these time-consuming methods, deep learning models are employed because they are less time-consuming, require less expensive equipment, produce more accurate results, and are easy to set up. In this study, we propose a method based on transfer learning, utilizing the pre-trained VGG-19 model. This approach has been enhanced by applying a customized convolutional neural network framework and combining it with pre-processing methods, including normalization and data augmentation. For training and testing, our proposed model used 80% and 20% of the images from the dataset, respectively. Our proposed method achieved remarkable success, with an accuracy rate of 99.43%, a sensitivity of 98.73%, and a specificity of 97.21%. The dataset, sourced from Kaggle for training purposes, consists of 407 images, including 257 depicting brain tumors and 150 without tumors. These models could be utilized to develop clinically useful solutions for identifying brain tumors in CT images based on these outcomes.


Subject(s)
Brain Neoplasms , Neural Networks, Computer , Humans , Brain Neoplasms/diagnostic imaging , Tomography, X-Ray Computed , Magnetic Resonance Imaging , Brain
9.
BMC Bioinformatics ; 24(1): 458, 2023 Dec 06.
Article in English | MEDLINE | ID: mdl-38053030

ABSTRACT

Intense sun exposure is a major risk factor for the development of melanoma, an abnormal proliferation of skin cells. Yet, this more prevalent type of skin cancer can also develop in less-exposed areas, such as those that are shaded. Melanoma is the sixth most common type of skin cancer. In recent years, computer-based methods for imaging and analyzing biological systems have made considerable strides. This work investigates the use of advanced machine learning methods, specifically ensemble models with Auto Correlogram Methods, Binary Pyramid Pattern Filter, and Color Layout Filter, to enhance the detection accuracy of Melanoma skin cancer. These results suggest that the Color Layout Filter model of the Attribute Selection Classifier provides the best overall performance. Statistics for ROC, PRC, Kappa, F-Measure, and Matthews Correlation Coefficient were as follows: 90.96% accuracy, 0.91 precision, 0.91 recall, 0.95 ROC, 0.87 PRC, 0.87 Kappa, 0.91 F-Measure, and 0.82 Matthews Correlation Coefficient. In addition, its margins of error are the smallest. The research found that the Attribute Selection Classifier performed well when used in conjunction with the Color Layout Filter to improve image quality.


Subject(s)
Melanoma , Skin Neoplasms , Humans , Algorithms , Skin Neoplasms/diagnostic imaging , Melanoma/diagnostic imaging , Machine Learning , Melanoma, Cutaneous Malignant
10.
BMC Bioinformatics ; 24(1): 479, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-38102551

ABSTRACT

Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.


Subject(s)
Algorithms , Neoplasms , Humans , Microarray Analysis , Neoplasms/genetics , Genetic Techniques , Machine Learning
11.
Biomarkers ; 28(2): 139-151, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36503350

ABSTRACT

Cancer stem cells (CSCs) are self-renewing and slow-multiplying micro subpopulations in tumour microenvironments. CSCs contribute to cancer's resistance to radiation (including radiation) and other treatments. CSCs control the heterogeneity of the tumour. It alters the tumour's microenvironment cellular singling and promotes epithelial-to-mesenchymal transition (EMT). Current research decodes the role of extracellular vesicles (EVs) and CSCs interlink in radiation resistance. Exosome is a subpopulation of EVs and originated from plasma membrane. It is secreted by several active cells. It involed in cellular communication and messenger of healthly and multiple pathological complications. Exosomal biological active cargos (DNA, RNA, protein, lipid and glycan), are capable to transform recipient cells' nature. The molecular signatures of CSCs and CSC-derived exosomes are potential source of cancer theranostics development. This review discusse cancer stem cells, radiation-mediated CSCs development, EMT associated with CSCs, the role of exosomes in radioresistance development, the current state of radiation therapy and the use of CSCs and CSCs-derived exosomes biomolecules as a clinical screening biomarker for cancer. This review gives new researchers a reason to keep an eye on the next phase of scientific research into cancer theranostics that will help mankind.


Subject(s)
Exosomes , Neoplasms , Humans , Clinical Relevance , Neoplasms/radiotherapy , Neoplasms/pathology , Epithelial-Mesenchymal Transition/genetics , Neoplastic Stem Cells/metabolism , Neoplastic Stem Cells/pathology , Tumor Microenvironment
12.
BMC Med Imaging ; 23(1): 146, 2023 10 02.
Article in English | MEDLINE | ID: mdl-37784025

ABSTRACT

COVID-19, the global pandemic of twenty-first century, has caused major challenges and setbacks for researchers and medical infrastructure worldwide. The CoVID-19 influences on the patients respiratory system cause flooding of airways in the lungs. Multiple techniques have been proposed since the outbreak each of which is interdepended on features and larger training datasets. It is challenging scenario to consolidate larger datasets for accurate and reliable decision support. This research article proposes a chest X-Ray images classification approach based on feature thresholding in categorizing the CoVID-19 samples. The proposed approach uses the threshold value-based Feature Extraction (TVFx) technique and has been validated on 661-CoVID-19 X-Ray datasets in providing decision support for medical experts. The model has three layers of training datasets to attain a sequential pattern based on various learning features. The aligned feature-set of the proposed technique has successfully categorized CoVID-19 active samples into mild, serious, and extreme categories as per medical standards. The proposed technique has achieved an accuracy of 97.42% in categorizing and classifying given samples sets.


Subject(s)
COVID-19 , Humans , COVID-19/diagnostic imaging , X-Rays , Neural Networks, Computer , Pandemics , Thorax
13.
Chem Biodivers ; 20(1): e202200684, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36480442

ABSTRACT

Globally Alzheimer's disease (AD) is a highly complex, heterogeneous, and multifactorial neurological disease. AD is categorized clinically through a steady loss in memory and progressive decline of cognitive function. So far, there is no effective cure is available for the treatment of AD. Here, we identified Plant-based compounds (PBCs) from seven therapeutic plants through pharmacophore and pharmacokinetics approaches. Subsequently, we retrieved 65 AD associated proteins by Text Mining approach .We observed the interactions between 39 PBCs with 65 AD-associated targets by using molecular docking. Further, we carried out Molecular dynamics simulation analysis to predict the steady binding of top drug-target complexes. The entire MD simulation results analysis was evidence that seven drug-target complexes consistently interacted during the in silico experiment. The top complexes were the target CHLE interacted with 2 PBCs (Pseudojujubogenin and Anahygrine), target VDAC1 interacted with Withanolide R, target THOP1 interacted with Withaolide R, target AOFB interacted with 2 PBCs (Nardostachysin and Viscosalactone B), and target ACHE interacted with the drug (12-Deoxywithastramonolide). These PBCs have stably and flexibly interacted at the protein's active site region. Our results suggest that these PBCs and targets are potential therapeutic candidates for molecular development in AD.


Subject(s)
Alzheimer Disease , Molecular Dynamics Simulation , Humans , Molecular Docking Simulation , Alzheimer Disease/drug therapy , Cholinesterase Inhibitors/chemistry , Catalytic Domain , Acetylcholinesterase/metabolism
14.
Chem Biodivers ; 20(8): e202201123, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37394680

ABSTRACT

The most significant groupings of cold-blooded creatures are the fish family. It is crucial to recognize and categorize the most significant species of fish since various species of seafood diseases and decay exhibit different symptoms. Systems based on enhanced deep learning can replace the area's currently cumbersome and sluggish traditional approaches. Although it seems straightforward, classifying fish images is a complex procedure. In addition, the scientific study of population distribution and geographic patterns is important for advancing the field's present advancements. The goal of the proposed work is to identify the best performing strategy using cutting-edge computer vision, the Chaotic Oppositional Based Whale Optimization Algorithm (CO-WOA), and data mining techniques. Performance comparisons with leading models, such as Convolutional Neural Networks (CNN) and VGG-19, are made to confirm the applicability of the suggested method. The suggested feature extraction approach with Proposed Deep Learning Model was used in the research, yielding accuracy rates of 100 %. The performance was also compared to cutting-edge image processing models with an accuracy of 98.48 %, 98.58 %, 99.04 %, 98.44 %, 99.18 % and 99.63 % such as Convolutional Neural Networks, ResNet150V2, DenseNet, Visual Geometry Group-19, Inception V3, Xception. Using an empirical method leveraging artificial neural networks, the Proposed Deep Learning model was shown to be the best model.


Subject(s)
Deep Learning , Animals , Whales , Algorithms , Neural Networks, Computer , Image Processing, Computer-Assisted/methods
15.
BMC Bioinformatics ; 23(Suppl 3): 153, 2022 Apr 28.
Article in English | MEDLINE | ID: mdl-35484501

ABSTRACT

BACKGROUND: As many complex omics data have been generated during the last two decades, dimensionality reduction problem has been a challenging issue in better mining such data. The omics data typically consists of many features. Accordingly, many feature selection algorithms have been developed. The performance of those feature selection methods often varies by specific data, making the discovery and interpretation of results challenging. METHODS AND RESULTS: In this study, we performed a comprehensive comparative study of five widely used supervised feature selection methods (mRMR, INMIFS, DFS, SVM-RFE-CBR and VWMRmR) for multi-omics datasets. Specifically, we used five representative datasets: gene expression (Exp), exon expression (ExpExon), DNA methylation (hMethyl27), copy number variation (Gistic2), and pathway activity dataset (Paradigm IPLs) from a multi-omics study of acute myeloid leukemia (LAML) from The Cancer Genome Atlas (TCGA). The different feature subsets selected by the aforesaid five different feature selection algorithms are assessed using three evaluation criteria: (1) classification accuracy (Acc), (2) representation entropy (RE) and (3) redundancy rate (RR). Four different classifiers, viz., C4.5, NaiveBayes, KNN, and AdaBoost, were used to measure the classification accuary (Acc) for each selected feature subset. The VWMRmR algorithm obtains the best Acc for three datasets (ExpExon, hMethyl27 and Paradigm IPLs). The VWMRmR algorithm offers the best RR (obtained using normalized mutual information) for three datasets (Exp, Gistic2 and Paradigm IPLs), while it gives the best RR (obtained using Pearson correlation coefficient) for two datasets (Gistic2 and Paradigm IPLs). It also obtains the best RE for three datasets (Exp, Gistic2 and Paradigm IPLs). Overall, the VWMRmR algorithm yields best performance for all three evaluation criteria for majority of the datasets. In addition, we identified signature genes using supervised learning collected from the overlapped top feature set among five feature selection methods. We obtained a 7-gene signature (ZMIZ1, ENG, FGFR1, PAWR, KRT17, MPO and LAT2) for EXP, a 9-gene signature for ExpExon, a 7-gene signature for hMethyl27, one single-gene signature (PIK3CG) for Gistic2 and a 3-gene signature for Paradigm IPLs. CONCLUSION: We performed a comprehensive comparison of the performance evaluation of five well-known feature selection methods for mining features from various high-dimensional datasets. We identified signature genes using supervised learning for the specific omic data for the disease. The study will help incorporate higher order dependencies among features.


Subject(s)
DNA Copy Number Variations , Neoplasms , Algorithms , DNA Methylation , Humans , Neoplasms/genetics , Neoplasms/metabolism
16.
Ann Surg ; 275(1): e229-e237, 2022 01 01.
Article in English | MEDLINE | ID: mdl-32398486

ABSTRACT

OBJECTIVE: The aim of the study was to perform mRNA-miRNA regulatory network analyses to identify a miRNA panel for molecular subtype identification and stratification of high-risk patients with pancreatic ductal adenocarcinoma (PDAC). BACKGROUND: Recent transcriptional profiling effort in PDAC has led to the identification of molecular subtypes that associate with poor survival; however, their clinical significance for risk stratification in patients with PDAC has been challenging. METHODS: By performing a systematic analysis in The Cancer Genome Atlas and International Cancer Genome Consortium cohorts, we discovered a panel of miRNAs that associated with squamous and other poor molecular subtypes in PDAC. Subsequently, we used logistic regression analysis to develop models for risk stratification and Cox proportional hazard analysis to determine survival prediction probability of this signature in multiple cohorts of 433 patients with PDAC, including a tissue cohort (n = 199) and a preoperative serum cohort (n = 51). RESULTS: We identified a panel of 9 miRNAs that were significantly upregulated (miR-205-5p and -934) or downregulated (miR-192-5p, 194-5p, 194-3p, 215-5p, 375-3p, 552-3p, and 1251-5p) in PDAC molecular subtypes with poor survival [squamous, area under the receiver operating characteristic curve (AUC) = 0.90; basal, AUC = 0.89; and quasimesenchymal, AUC = 0.83]. The validation of this miRNA panel in a tissue clinical cohort was a significant predictor of overall survival (hazard ratio = 2.48, P < 0.0001), and this predictive accuracy improved further in a risk nomogram which included key clinicopathological factors. Finally, we were able to successfully translate this miRNA predictive signature into a liquid biopsy-based assay in preoperative serum specimens from PDAC patients (hazard ratio: 2.85, P = 0.02). CONCLUSION: We report a novel miRNA risk-stratification signature that can be used as a noninvasive assay for the identification of high-risk patients and potential disease monitoring in patients with PDAC.


Subject(s)
Carcinoma, Pancreatic Ductal/genetics , Gene Expression Regulation, Neoplastic , MicroRNAs/genetics , Pancreatic Neoplasms/genetics , Risk Assessment/methods , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/biosynthesis , Biomarkers, Tumor/blood , Biomarkers, Tumor/genetics , Carcinoma, Pancreatic Ductal/blood , Carcinoma, Pancreatic Ductal/diagnosis , Female , Gene Expression Profiling , Humans , Male , MicroRNAs/biosynthesis , MicroRNAs/blood , Middle Aged , Neoplasm Staging , Pancreatic Neoplasms/blood , Pancreatic Neoplasms/diagnosis , Retrospective Studies
17.
Brief Bioinform ; 21(2): 368-394, 2020 03 23.
Article in English | MEDLINE | ID: mdl-30649169

ABSTRACT

Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.


Subject(s)
Algorithms , Genomics , Machine Learning , Neoplasms/classification , Data Mining , Datasets as Topic , Humans , Neoplasms/genetics , Neoplasms/pathology , Prognosis
18.
Brief Bioinform ; 21(4): 1465-1478, 2020 07 15.
Article in English | MEDLINE | ID: mdl-31589286

ABSTRACT

Cleft palate (CP) is the second most common congenital birth defect. The etiology of CP is complicated, with involvement of various genetic and environmental factors. To investigate the gene regulatory mechanisms, we designed a powerful regulatory analytical approach to identify the conserved regulatory networks in humans and mice, from which we identified critical microRNAs (miRNAs), target genes and regulatory motifs (miRNA-TF-gene) related to CP. Using our manually curated genes and miRNAs with evidence in CP in humans and mice, we constructed miRNA and transcription factor (TF) co-regulation networks for both humans and mice. A consensus regulatory loop (miR17/miR20a-FOXE1-PDGFRA) and eight miRNAs (miR-140, miR-17, miR-18a, miR-19a, miR-19b, miR-20a, miR-451a and miR-92a) were discovered in both humans and mice. The role of miR-140, which had the strongest association with CP, was investigated in both human and mouse palate cells. The overexpression of miR-140-5p, but not miR-140-3p, significantly inhibited cell proliferation. We further examined whether miR-140 overexpression could suppress the expression of its predicted target genes (BMP2, FGF9, PAX9 and PDGFRA). Our results indicated that miR-140-5p overexpression suppressed the expression of BMP2 and FGF9 in cultured human palate cells and Fgf9 and Pdgfra in cultured mouse palate cells. In summary, our conserved miRNA-TF-gene regulatory network approach is effective in detecting consensus miRNAs, motifs, and regulatory mechanisms in human and mouse CP.


Subject(s)
Cleft Palate/genetics , Conserved Sequence , Gene Regulatory Networks , MicroRNAs/genetics , Transcription Factors/genetics , Animals , Humans , Mice
19.
Brief Bioinform ; 20(6): 2224-2235, 2019 11 27.
Article in English | MEDLINE | ID: mdl-30239597

ABSTRACT

Epigenome-wide association studies (EWASs) have become increasingly popular for studying DNA methylation (DNAm) variations in complex diseases. The Illumina methylation arrays provide an economical, high-throughput and comprehensive platform for measuring methylation status in EWASs. A number of software tools have been developed for identifying disease-associated differentially methylated regions (DMRs) in the epigenome. However, in practice, we found these tools typically had multiple parameter settings that needed to be specified and the performance of the software tools under different parameters was often unclear. To help users better understand and choose optimal parameter settings when using DNAm analysis tools, we conducted a comprehensive evaluation of 4 popular DMR analysis tools under 60 different parameter settings. In addition to evaluating power, precision, area under precision-recall curve, Matthews correlation coefficient, F1 score and type I error rate, we also compared several additional characteristics of the analysis results, including the size of the DMRs, overlap between the methods and execution time. The results showed that none of the software tools performed best under their default parameter settings, and power varied widely when parameters were changed. Overall, the precision of these software tools were good. In contrast, all methods lacked power when effect size was consistent but small. Across all simulation scenarios, comb-p consistently had the best sensitivity as well as good control of false-positive rate.


Subject(s)
DNA Methylation , CpG Islands , Humans , Protein Processing, Post-Translational , Software
20.
PLoS Comput Biol ; 16(8): e1008145, 2020 08.
Article in English | MEDLINE | ID: mdl-32853212

ABSTRACT

Oligomeric proteins are central to life. Duplication and divergence of their genes is a key evolutionary driver, also because duplications can yield very different outcomes. Given a homomeric ancestor, duplication can yield two paralogs that form two distinct homomeric complexes, or a heteromeric complex comprising both paralogs. Alternatively, one paralog remains a homomer while the other acquires a new partner. However, so far, conflicting trends have been noted with respect to which fate dominates, primarily because different methods and criteria are being used to assign the interaction status of paralogs. Here, we systematically analyzed all Saccharomyces cerevisiae and Escherichia coli oligomeric complexes that include paralogous proteins. We found that the proportions of homo-hetero duplication fates strongly depend on a variety of factors, yet that nonetheless, rigorous filtering gives a consistent picture. In E. coli about 50%, of the paralogous pairs appear to have retained the ancestral homomeric interaction, whereas in S. cerevisiae only ~10% retained a homomeric state. This difference was also observed when unique complexes were counted instead of paralogous gene pairs. We further show that this difference is accounted for by multiple cases of heteromeric yeast complexes that share common ancestry with homomeric bacterial complexes. Our analysis settles contradicting trends and conflicting previous analyses, and provides a systematic and rigorous pipeline for delineating the fate of duplicated oligomers in any organism for which protein-protein interaction data are available.


Subject(s)
Biological Evolution , Escherichia coli Proteins/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Escherichia coli Proteins/genetics , Gene Duplication , Saccharomyces cerevisiae Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL