Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 1.064
1.
Biomed Res Int ; 2021: 8171236, 2021.
Article En | MEDLINE | ID: mdl-34812409

OBJECTIVE: This study is set out to explore the potential difference of miR in PD through GEO data and provide diagnostic indicators for clinical practice. METHODS: In this study, differential miR was screened through the Gene Expression Omnibus (GEO) database, 68 PD patients treated in our hospital from May 2017 to March 2018 were collected as the research group (RG), and 50 normal subjects who underwent physical examination in our hospital during the same period were collected as the control group (CG). Quantitative real-time polymerase chain reaction (qRT-PCR) was used to detect the expression and diagnostic value of miR-374a-5p in serum of patients. The potential target genes of miR-374a-5p were predicted, and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and Gene Ontology Consortium (GO) were carried out. RESULTS: GEO2R analysis revealed that 193 miRs are expressed differentially, of which 78 were highly expressed and 115 were poorly expressed. The miR-374a-5p expression in the serum of the RG was reduced markedly and had a diagnostic value. Targetscan and miRDB online websites were used to predict their target genes, with 415 common target genes. miR-374a-5p may participate in 27 functional pathways and 8 signal pathways. CONCLUSION: miR-335-5p has low expression in PD and is expected to be a potential diagnostic indicator.


MicroRNAs/genetics , Parkinson Disease/genetics , Case-Control Studies , Computational Biology , Databases, Nucleic Acid , Gene Ontology , Genetic Markers , Humans , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Parkinson Disease/diagnosis , Signal Transduction/genetics
2.
Comput Math Methods Med ; 2021: 7471516, 2021.
Article En | MEDLINE | ID: mdl-34394707

High-throughput data make it possible to study expression levels of thousands of genes simultaneously under a particular condition. However, only few of the genes are discriminatively expressed. How to identify these biomarkers precisely is significant for disease diagnosis, prognosis, and therapy. Many studies utilized pathway information to identify the biomarkers. However, most of these studies only incorporate the group information while the pathway structural information is ignored. In this paper, we proposed a Bayesian gene selection with a network-constrained regularization method, which can incorporate the pathway structural information as priors to perform gene selection. All the priors are conjugated; thus, the parameters can be estimated effectively through Gibbs sampling. We present the application of our method on 6 microarray datasets, comparing with Bayesian Lasso, Bayesian Elastic Net, and Bayesian Fused Lasso. The results show that our method performs better than other Bayesian methods and pathway structural information can improve the result.


Bayes Theorem , Gene Regulatory Networks , Genetic Markers , Biomarkers, Tumor/genetics , Computational Biology , Computer Simulation , Databases, Genetic/statistics & numerical data , Female , Gene Expression Profiling , Genetic Predisposition to Disease , Humans , Male , Models, Genetic , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data
3.
Comput Math Methods Med ; 2021: 5584684, 2021.
Article En | MEDLINE | ID: mdl-34122617

In view of the challenges of the group Lasso penalty methods for multicancer microarray data analysis, e.g., dividing genes into groups in advance and biological interpretability, we propose a robust adaptive multinomial regression with sparse group Lasso penalty (RAMRSGL) model. By adopting the overlapping clustering strategy, affinity propagation clustering is employed to obtain each cancer gene subtype, which explores the group structure of each cancer subtype and merges the groups of all subtypes. In addition, the data-driven weights based on noise are added to the sparse group Lasso penalty, combining with the multinomial log-likelihood function to perform multiclassification and adaptive group gene selection simultaneously. The experimental results on acute leukemia data verify the effectiveness of the proposed method.


Algorithms , Neoplasms/classification , Neoplasms/genetics , Cluster Analysis , Computational Biology , Databases, Genetic/statistics & numerical data , Humans , Leukemia/classification , Leukemia/genetics , Likelihood Functions , Models, Genetic , Multigene Family , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Oncogenes , Regression Analysis
4.
Comput Math Methods Med ; 2021: 5556992, 2021.
Article En | MEDLINE | ID: mdl-33986823

Ensemble learning combines multiple learners to perform combinatorial learning, which has advantages of good flexibility and higher generalization performance. To achieve higher quality cancer classification, in this study, the fast correlation-based feature selection (FCBF) method was used to preprocess the data to eliminate irrelevant and redundant features. Then, the classification was carried out in the stacking ensemble learner. A library for support vector machine (LIBSVM), K-nearest neighbor (KNN), decision tree C4.5 (C4.5), and random forest (RF) were used as the primary learners of the stacking ensemble. Given the imbalanced characteristics of cancer gene expression data, the embedding cost-sensitive naive Bayes was used as the metalearner of the stacking ensemble, which was represented as CSNB stacking. The proposed CSNB stacking method was applied to nine cancer datasets to further verify the classification performance of the model. Compared with other classification methods, such as single classifier algorithms and ensemble algorithms, the experimental results showed the effectiveness and robustness of the proposed method in processing different types of cancer data. This method may therefore help guide cancer diagnosis and research.


Algorithms , Machine Learning , Neoplasms/classification , Bayes Theorem , Computational Biology , Databases, Genetic/statistics & numerical data , Decision Trees , Female , Gene Expression Regulation, Neoplastic , Humans , Male , Neoplasms/genetics , Neural Networks, Computer , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Oncogenes , ROC Curve , Support Vector Machine
5.
Nucleic Acids Res ; 49(D1): D1502-D1506, 2021 01 08.
Article En | MEDLINE | ID: mdl-33211879

ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.


Databases, Genetic , Epigenesis, Genetic , Genomics/methods , High-Throughput Nucleotide Sequencing/statistics & numerical data , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Animals , Cell Line , DNA Methylation , Gene Expression Profiling , Humans , Internet , Metadata , Organ Specificity , Plants/genetics , Single-Cell Analysis , Software
6.
Clin Chem ; 66(7): 934-945, 2020 07 01.
Article En | MEDLINE | ID: mdl-32613237

BACKGROUND: We translated a multigene expression index to predict sensitivity to endocrine therapy for Stage II-III breast cancer (SET2,3) to hybridization-based expression assays of formalin-fixed paraffin-embedded (FFPE) tissue sections. Here we report the technical validity with FFPE samples, including preanalytical and analytical performance. METHODS: We calibrated SET2,3 from microarrays (Affymetrix U133A) of frozen samples to hybridization-based assays of FFPE tissue, using bead-based QuantiGene Plex (QGP) and slide-based NanoString (NS). The following preanalytical and analytical conditions were tested in controlled studies: replicates within and between frozen and fixed samples, age of paraffin blocks, homogenization of fixed sections versus extracted RNA, core biopsy versus surgically resected tumor, technical replicates, precision over 20 weeks, limiting dilution, linear range, and analytical sensitivity. Lin's concordance correlation coefficient (CCC) was used to measure concordance between measurements. RESULTS: SET2,3 index was calibrated to use with QGP (CCC 0.94) and NS (CCC 0.93) technical platforms, and was validated in two cohorts of older fixed samples using QGP (CCC 0.72, 0.85) and NS (CCC 0.78, 0.78). QGP assay was concordant using direct homogenization of fixed sections versus purified RNA (CCC 0.97) and between core and surgical sample types (CCC 0.90), with 100% accuracy in technical replicates, 1-9% coefficient of variation over 20 weekly tests, linear range 3.0-11.5 (log2 counts), and analytical sensitivity ≥2.0 (log2 counts). CONCLUSIONS: Measurement of the novel SET2,3 assay was technically valid from fixed tumor sections of biopsy or resection samples using simple, inexpensive, hybridization methods, without the need for RNA purification.


Breast Neoplasms/genetics , Gene Expression Profiling/statistics & numerical data , Oligonucleotide Array Sequence Analysis/statistics & numerical data , RNA, Messenger/analysis , Aurora Kinase A/genetics , Breast Neoplasms/drug therapy , Breast Neoplasms/pathology , Cohort Studies , Estrogen Receptor alpha/genetics , Estrogens/therapeutic use , Humans , Paraffin Embedding , Receptor, ErbB-2/genetics , Receptors, Progesterone/genetics , Reproducibility of Results , Tissue Fixation
7.
BMC Cancer ; 20(1): 490, 2020 Jun 02.
Article En | MEDLINE | ID: mdl-32487193

BACKGROUND: Stomach cancer (SC) is a type of cancer, which is derived from the stomach mucous membrane. As there are non-specific symptoms or no noticeable symptoms observed at the early stage, newly diagnosed SC cases usually reach an advanced stage and are thus difficult to cure. Therefore, in this study, we aimed to develop an integrated database of SC. METHODS: SC-related genes were identified through literature mining and by analyzing the publicly available microarray datasets. Using the RNA-seq, miRNA-seq and clinical data downloaded from The Cancer Genome Atlas (TCGA), the Kaplan-Meier (KM) survival curves for all the SC-related genes were generated and analyzed. The miRNAs (miRanda, miRTarget2, PicTar, PITA and TargetScan databases), SC-related miRNAs (HMDD and miR2Disease databases), single nucleotide polymorphisms (SNPs, dbSNP database), and SC-related SNPs (ClinVar database) were also retrieved from the indicated databases. Moreover, gene_disease (OMIM and GAD databases), copy number variation (CNV, DGV database), methylation (PubMeth database), drug (WebGestalt database), and transcription factor (TF, TRANSFAC database) analyses were performed for the differentially expressed genes (DEGs). RESULTS: In total, 9990 SC-related genes (including 8347 up-regulated genes and 1643 down-regulated genes) were identified, among which, 65 genes were further confirmed as SC-related genes by performing enrichment analysis. Besides this, 457 miRNAs, 20 SC-related miRNAs, 1570 SNPs, 108 SC-related SNPs, 419 TFs, 44,605 CNVs, 3404 drug-associated genes, 63 genes with methylation, and KM survival curves of 20,264 genes were obtained. By integrating these datasets, an integrated database of stomach cancer, designated as SCDb, (available at http://www.stomachcancerdb.org/) was established. CONCLUSIONS: As a comprehensive resource for human SC, SCDb database will be very useful for performing SC-related research in future, and will thus promote the understanding of the pathogenesis of SC.


Computational Biology/methods , Databases, Genetic/statistics & numerical data , Datasets as Topic , Gene Expression Regulation, Neoplastic , Stomach Neoplasms/genetics , Computational Biology/statistics & numerical data , Gene Regulatory Networks , Humans , Kaplan-Meier Estimate , MicroRNAs/metabolism , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Polymorphism, Single Nucleotide , RNA-Seq/statistics & numerical data , Stomach Neoplasms/mortality , Stomach Neoplasms/pathology
8.
J Bioinform Comput Biol ; 18(1): 2050002, 2020 02.
Article En | MEDLINE | ID: mdl-32336254

Gene set analysis aims to identify differentially expressed or co-expressed genes within a biological pathway between two experimental conditions, so that it can eventually reveal biological processes and pathways involved in disease development. In the last few decades, various statistical and computational methods have been proposed to improve statistical power of gene set analysis. In recent years, much attention has been paid to differentially co-expressed genes since they can be potentially disease-related genes without significant difference in average expression levels between two conditions. In this paper, we propose a new statistical method to identify differentially co-expressed genes from microarray gene expression data. The proposed method first estimates co-expression levels of paired genes using covariance regularization by thresholding, and then significance of difference in covariance estimation between two conditions is evaluated. We demonstrated that the proposed method is more powerful than the existing main-stream methods to detect co-expressed genes through extensive simulation studies. Also, we applied it to various microarray gene expression datasets related with mutant p53 transcriptional activity, and epithelium and stroma breast cancer.


Breast Neoplasms/genetics , Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Breast Neoplasms/pathology , Computer Simulation , Female , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation, Neoplastic , Humans , Mutation , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Tumor Suppressor Protein p53/genetics
9.
PLoS One ; 15(4): e0231000, 2020.
Article En | MEDLINE | ID: mdl-32287265

Myotonic dystrophy type 1 (DM1) is a rare genetic disorder, characterised by muscular dystrophy, myotonia, and other symptoms. DM1 is caused by the expansion of a CTG repeat in the 3'-untranslated region of DMPK. Longer CTG expansions are associated with greater symptom severity and earlier age at onset. The primary mechanism of pathogenesis is thought to be mediated by a gain of function of the CUG-containing RNA, that leads to trans-dysregulation of RNA metabolism of many other genes. Specifically, the alternative splicing (AS) and alternative polyadenylation (APA) of many genes is known to be disrupted. In the context of clinical trials of emerging DM1 treatments, it is important to be able to objectively quantify treatment efficacy at the level of molecular biomarkers. We show how previously described candidate mRNA biomarkers can be used to model an effective reduction in CTG length, using modern high-dimensional statistics (machine learning), and a blood and muscle mRNA microarray dataset. We show how this model could be used to detect treatment effects in the context of a clinical trial.


Myotonic Dystrophy/genetics , Myotonic Dystrophy/therapy , RNA, Messenger/genetics , Alternative Splicing , Biostatistics , Clinical Trials as Topic/methods , Clinical Trials as Topic/statistics & numerical data , Databases, Nucleic Acid/statistics & numerical data , Genetic Markers , Humans , Least-Squares Analysis , Machine Learning , Models, Genetic , Muscles/metabolism , Myotonic Dystrophy/metabolism , Myotonin-Protein Kinase/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Polyadenylation , RNA, Messenger/metabolism , Treatment Outcome , Trinucleotide Repeat Expansion
10.
BMC Res Notes ; 13(1): 92, 2020 Feb 24.
Article En | MEDLINE | ID: mdl-32093752

OBJECTIVE: The biological interpretation of gene expression measurements is a challenging task. While ordination methods are routinely used to identify clusters of samples or co-expressed genes, these methods do not take sample or gene annotations into account. We aim to provide a tool that allows users of all backgrounds to assess and visualize the intrinsic correlation structure of complex annotated gene expression data and discover the covariates that jointly affect expression patterns. RESULTS: The Bioconductor package covRNA provides a convenient and fast interface for testing and visualizing complex relationships between sample and gene covariates mediated by gene expression data in an entirely unsupervised setting. The relationships between sample and gene covariates are tested by statistical permutation tests and visualized by ordination. The methods are inspired by the fourthcorner and RLQ analyses used in ecological research for the analysis of species abundance data, that we modified to make them suitable for the distributional characteristics of both, RNA-Seq read counts and microarray intensities, and to provide a high-performance parallelized implementation for the analysis of large-scale gene expression data on multi-core computational systems. CovRNA provides additional modules for unsupervised gene filtering and plotting functions to ensure a smooth and coherent analysis workflow.


Computational Biology/methods , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Software , Humans , Multivariate Analysis , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Reproducibility of Results
11.
Med Sci Monit ; 26: e920261, 2020 Feb 14.
Article En | MEDLINE | ID: mdl-32058995

BACKGROUND Gastric adenocarcinoma accounts for 95% of all gastric malignant tumors. The purpose of this research was to identify differentially expressed genes (DEGs) of gastric adenocarcinoma by use of bioinformatics methods. MATERIAL AND METHODS The gene microarray datasets of GSE103236, GSE79973, and GSE29998 were imported from the GEO database, containing 70 gastric adenocarcinoma samples and 68 matched normal samples. Gene ontology (GO) and KEGG analysis were applied to screened DEGs; Cytoscape software was used for constructing protein-protein interaction (PPI) networks and to perform module analysis of the DEGs. UALCAN was used for prognostic analysis. RESULTS We identified 2909 upregulated DEGs (uDEGs) and 7106 downregulated DEGs (dDEGs) of gastric adenocarcinoma. The GO analysis showed uDEGs were enriched in skeletal system development, cell adhesion, and biological adhesion. KEGG pathway analysis showed uDEGs were enriched in ECM-receptor interaction, focal adhesion, and Cytokine-cytokine receptor interaction. The top 10 hub genes - COL1A1, COL3A1, COL1A2, BGN, COL5A2, THBS2, TIMP1, SPP1, PDGFRB, and COL4A1 - were distinguished from the PPI network. These 10 hub genes were shown to be significantly upregulated in gastric adenocarcinoma tissues in GEPIA. Prognostic analysis of the 10 hub genes via UALCAN showed that the upregulated expression of COL3A1, COL1A2, BGN, and THBS2 significantly reduced the survival time of gastric adenocarcinoma patients. Module analysis revealed that gastric adenocarcinoma was related to 2 pathways: including focal adhesion signaling and ECM-receptor interaction. CONCLUSIONS This research distinguished hub genes and relevant signal pathways, which contributes to our understanding of the molecular mechanisms, and could be used as diagnostic indicators and therapeutic biomarkers for gastric adenocarcinoma.


Adenocarcinoma/genetics , Biomarkers, Tumor/genetics , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Stomach Neoplasms/genetics , Adenocarcinoma/mortality , Adenocarcinoma/pathology , Computational Biology , Datasets as Topic , Gastric Mucosa/pathology , Gene Expression Profiling , Humans , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Prognosis , Protein Interaction Mapping , Protein Interaction Maps/genetics , Signal Transduction/genetics , Stomach Neoplasms/mortality , Stomach Neoplasms/pathology , Survival Analysis , Time Factors
12.
J Comput Biol ; 27(9): 1384-1396, 2020 09.
Article En | MEDLINE | ID: mdl-32031874

One of the main methods to analyze gene expression data is biclustering, a nonsupervised technique, which consists of selection subgroups of genes that co-expressed under subgroups of experimental conditions. A large number of biclustering algorithms have been developed to classify gene expression data. These algorithms can give as output a large number of overlapped biclusters, whose visualization still requires deeper studies. We present VisBicluster, a web-based interactive visualization tool for displaying biclustering results. The developed visualization technique consists of laying out the generated biclusters in a two-dimensional matrix where each bicluster is represented as a column and each overlap between a set of biclusters is represented as a row. A search interface for the user is developed to query the matrix of bicluster intersection and visualize the results matching the queries. Our tool supports many interactive features such as sorting, zooming, and details-on-demand. We proved the usefulness of VisBicluster with biclustering results from real and synthetic datasets. Besides, we performed a user study with 14 participants to illustrate the clarity and simplicity of overlap representation with our tool.


Computational Biology , Gene Expression Profiling/statistics & numerical data , Gene Expression/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Algorithms , Cluster Analysis , Computer Graphics , Humans , User-Computer Interface
13.
Cancer Med ; 9(4): 1419-1429, 2020 02.
Article En | MEDLINE | ID: mdl-31893575

Early identification of metastatic or recurrent colorectal cancer (CRC) patients who will be sensitive to FOLFOX (5-FU, leucovorin and oxaliplatin) therapy is very important. We performed microarray meta-analysis to identify differentially expressed genes (DEGs) between FOLFOX responders and nonresponders in metastatic or recurrent CRC patients, and found that the expression levels of WASHC4, HELZ, ERN1, RPS6KB1, and APPBP2 were downregulated, while the expression levels of IRF7, EML3, LYPLA2, DRAP1, RNH1, PKP3, TSPAN17, LSS, MLKL, PPP1R7, GCDH, C19ORF24, and CCDC124 were upregulated in FOLFOX responders compared with nonresponders. Subsequent functional annotation showed that DEGs were significantly enriched in autophagy, ErbB signaling pathway, mitophagy, endocytosis, FoxO signaling pathway, apoptosis, and antifolate resistance pathways. Based on those candidate genes, several machine learning algorithms were applied to the training set, then performances of models were assessed via the cross validation method. Candidate models with the best tuning parameters were applied to the test set and the final model showed satisfactory performance. In addition, we also reported that MLKL and CCDC124 gene expression were independent prognostic factors for metastatic CRC patients undergoing FOLFOX therapy.


Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Biomarkers, Tumor/genetics , Colorectal Neoplasms/drug therapy , Machine Learning , Neoplasm Recurrence, Local/drug therapy , Cell Cycle Proteins/genetics , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Datasets as Topic , Fluorouracil/therapeutic use , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation, Neoplastic , Humans , Intracellular Signaling Peptides and Proteins/genetics , Leucovorin/therapeutic use , Neoplasm Recurrence, Local/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Organoplatinum Compounds/therapeutic use , Prognosis , Protein Kinases/genetics , Response Evaluation Criteria in Solid Tumors
14.
Future Oncol ; 16(3): 4461-4473, 2020 Jan.
Article En | MEDLINE | ID: mdl-31854204

Currently, the prognostic effects of leukemia inhibitory factor (LIF) and LIF receptor (LIFR) in pancreatic adenocarcinoma (PAAD) are not clear. In the present study, we utilized the large datasets from four public databases to investigate the expression of LIF and LIFR and their clinical significance in PAAD. Eight cohorts containing 1278 cases with PAAD were identified and the analysis results suggested that LIF was highly expressed while LIFR was lowly expressed in PAAD tissues compared with adjacent or normal tissues. Kaplan-Meier plot curves and univariate and multivariate Cox proportional hazards regression analyses indicated high LIF expression was associated with shorter overall survival (adjusted hazard ratio = 1.641, 95% CI: 1.399-1.925, p < 0.001) whereas high LIFR expression was associated with longer overall survival (adjusted hazard ratio = 0.653, 95% CI: 0.517-0.826, p < 0.001).


Adenocarcinoma/genetics , Biomarkers, Tumor/genetics , Leukemia Inhibitory Factor Receptor alpha Subunit/genetics , Leukemia Inhibitory Factor/genetics , Pancreatic Neoplasms/genetics , Adenocarcinoma/mortality , Adenocarcinoma/pathology , Aged , Cohort Studies , Datasets as Topic , Down-Regulation , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Kaplan-Meier Estimate , Male , Middle Aged , Neoplasm Staging , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Pancreas/pathology , Pancreatic Neoplasms/mortality , Pancreatic Neoplasms/pathology , Prognosis , Up-Regulation , Pancreatic Neoplasms
15.
Cancer Med ; 9(3): 1242-1253, 2020 02.
Article En | MEDLINE | ID: mdl-31856408

Most high-grade serous ovarian cancer (HGSOC) patients develop resistance to platinum-based chemotherapy and recur. Many biomarkers related to the survival and prognosis of drug-resistant patients have been delved by mining databases; however, the prediction effect of single-gene biomarker is not specific and sensitive enough. The present study aimed to develop a novel prognostic gene signature of platinum-based resistance for patients with HGSOC. The gene expression profiles were obtained from Gene Expression Omnibus and The Cancer Genome Atlas database. A total of 269 differentially expressed genes (DEGs) associated with platinum resistance were identified (P < .05, fold change >1.5). Functional analysis revealed that these DEGs were mainly involved in apoptosis process, PI3K-Akt pathway. Furthermore, we established a set of seven-gene signature that was significantly associated with overall survival (OS) in the test series. Compared with the low-risk score group, patients with a high-risk score suffered poorer OS (P < .001). The area under the curve (AUC) was found to be 0.710, which means the risk score had a certain accuracy on predicting OS in HGSOC (AUC > 0.7). Surprisingly, the risk score was identified as an independent prognostic indicator for HGSOC (P < .001). Subgroup analyses suggested that the risk score had a greater prognostic value for patients with grade 3-4, stage III-IV, venous invasion and objective response. In conclusion, we developed a seven-gene signature relating to platinum resistance, which can predict survival for HGSOC and provide novel insights into understanding of platinum resistance mechanisms and identification of HGSOC patients with poor prognosis.


Antineoplastic Combined Chemotherapy Protocols/pharmacology , Biomarkers, Tumor/genetics , Cystadenocarcinoma, Serous/drug therapy , Drug Resistance, Neoplasm/genetics , Organoplatinum Compounds/pharmacology , Ovarian Neoplasms/drug therapy , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Computational Biology , Cystadenocarcinoma, Serous/genetics , Cystadenocarcinoma, Serous/mortality , Cystadenocarcinoma, Serous/pathology , Datasets as Topic , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , Models, Genetic , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Organoplatinum Compounds/therapeutic use , Ovarian Neoplasms/genetics , Ovarian Neoplasms/mortality , Ovarian Neoplasms/pathology , Phosphatidylinositol 3-Kinases/metabolism , Prognosis , Progression-Free Survival , RNA, Messenger , ROC Curve , Transcriptome/genetics
16.
Clin Transl Sci ; 13(1): 169-178, 2020 01.
Article En | MEDLINE | ID: mdl-31794148

As an extremely prevalent disease worldwide, allergic rhinitis (AR) is a condition characterized by chronic inflammation of the nasal mucosa. To identify the finer molecular mechanisms associated with the AR susceptibility genes, differentially expressed genes (DEGs) in AR were investigated. The DEG expression and clinical data of the GSE19187 data set were used for weighted gene co-expression network analysis (WGCNA). After the modules related to AR had been screened, the genes in the module were extracted for Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, whereby the genes enriched in the KEGG pathway were regarded as the pathway-genes. The DEGs in patients with AR were subsequently screened out from GSE19187, and the sensitive genes were identified in GSE18574 in connection with the allergen challenge. Two kinds of genes were compared with the pathway-genes in order to screen the AR susceptibility genes. Receiver operating characteristic (ROC) curve was plotted to evaluate the capability of the susceptibility genes to distinguish the AR state. Based on the WGCNA in the GSE19187 data set, 10 co-expression network modules were identified. The correlation analyses revealed that the yellow module was positively correlated with the disease state of AR. A total of 89 genes were found to be involved in the enrichment of the yellow module pathway. Four genes (CST1, SH2D1B, DPP4, and SLC5A5) were upregulated in AR and sensitive to allergen challenge, whose potentials were further confirmed by ROC curve. Taken together, CST1, SH2D1B, DPP4, and SLC5A5 are susceptibility genes to AR.


Gene Regulatory Networks/immunology , Genetic Predisposition to Disease , Rhinitis, Allergic/genetics , Biomarkers/analysis , Computational Biology/methods , Datasets as Topic , Dipeptidyl Peptidase 4/analysis , Dipeptidyl Peptidase 4/genetics , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation/immunology , Humans , Nasal Mucosa/immunology , Nasal Mucosa/pathology , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Predictive Value of Tests , ROC Curve , Rhinitis, Allergic/epidemiology , Rhinitis, Allergic/immunology , Rhinitis, Allergic/pathology , Risk Assessment/methods , Salivary Cystatins/analysis , Salivary Cystatins/genetics , Symporters/analysis , Symporters/genetics , Transcription Factors/analysis , Transcription Factors/genetics
17.
Cancer Med ; 9(1): 335-349, 2020 01.
Article En | MEDLINE | ID: mdl-31743579

Gastric cancer (GC) remains an important malignancy worldwide with poor prognosis. Long noncoding RNAs (lncRNAs) can markedly affect cancer progression. Moreover, lncRNAs have been proposed as diagnostic or prognostic biomarkers of GC. Therefore, the current study aimed to explore lncRNA-based prognostic biomarkers for GC. LncRNA expression profiles from the Gene Expression Omnibus (GEO) database were first downloaded. After re-annotation of lncRNAs, a univariate Cox analysis identified 177 prognostic lncRNA probes in the training set GSE62254 (n = 225). Multivariate Cox analysis of each lncRNA with clinical characteristics as covariates identified a total of 46 prognostic lncRNA probes. Robust likelihood-based survival and least absolute shrinkage and selection operator (LASSO) models were used to establish a 6-lncRNA signature with prognostic value. Receiver operating characteristic (ROC) curve analyses were employed to compare survival prediction in terms of specificity and sensitivity. Patients with high-risk scores exhibited a significantly worse overall survival (OS) than patients with low-risk scores (log-rank test P-value <.0001), and the area under the ROC curve (AUC) for 5-year survival was 0.77. A nomogram and forest plot were constructed to compare the clinical characteristics and risk scores by a multivariable Cox regression analysis, which suggested that the 6-lncRNA signature can independently make the prognosis evaluation of patients. Single-sample GSEA (ssGSEA) was used to determine the relationships between the 6-lncRNA signature and biological functions. The internal validation set GSE62254 (n = 75) and the external validation set GSE57303 (n = 70) were successfully used to validate the robustness of our 6-lncRNA signature. In conclusion, based on the above results, the 6-lncRNA signature can effectively make the prognosis evaluation of GC patients.


Biomarkers, Tumor/metabolism , Nomograms , RNA, Long Noncoding/metabolism , Stomach Neoplasms/mortality , Datasets as Topic , Disease-Free Survival , Female , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation, Neoplastic , Humans , Kaplan-Meier Estimate , Likelihood Functions , Male , Middle Aged , Neoplasm Staging , Oligonucleotide Array Sequence Analysis/statistics & numerical data , ROC Curve , Stomach Neoplasms/genetics , Stomach Neoplasms/pathology
18.
J Bioinform Comput Biol ; 17(5): 1940010, 2019 10.
Article En | MEDLINE | ID: mdl-31856670

Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.


Databases, Genetic , Gene Expression Profiling/statistics & numerical data , Arthritis, Juvenile/genetics , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Humans , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Phenotype , Psoriasis/genetics , Reproducibility of Results
19.
PLoS One ; 14(11): e0224446, 2019.
Article En | MEDLINE | ID: mdl-31730620

Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional gene expression data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (≈ document) as a mixture over cancer-topics, where each cancer-topic is a mixture over gene expression values (≈ words). This required some extensions to the standard LDA model-e.g., to accommodate the real-valued expression values-leading to our novel discretized Latent Dirichlet Allocation (dLDA) procedure. After using this dLDA to learn these cancer-topics, we can then express each patient as a distribution over a small number of cancer-topics, then use this low-dimensional "distribution vector" as input to a learning algorithm-here, we ran the recent survival prediction algorithm, MTLR, on this representation of the cancer dataset. We initially focus on the METABRIC dataset, which describes each of n = 1,981 breast cancer patients using the r = 49,576 gene expression values, from microarrays. Our results show that our approach (dLDA followed by MTLR) provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this "dLDA+MTLR" approach by running it on the n = 883 Pan-kidney (KIPAN) dataset, over r = 15,529 gene expression values-here using the mRNAseq modality-and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach. The dLDA+MTLR source code is available at https://github.com/nitsanluke/GE-LDA-Survival.


Gene Expression Regulation, Neoplastic , Models, Biological , Natural Language Processing , Neoplasms/mortality , Datasets as Topic , Gene Expression Profiling/statistics & numerical data , Humans , Kaplan-Meier Estimate , Neoplasms/genetics , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Prognosis
20.
PLoS One ; 14(11): e0224750, 2019.
Article En | MEDLINE | ID: mdl-31730674

Chronic obstructive pulmonary disease (COPD) was classified by the Centers for Disease Control and Prevention in 2014 as the 3rd leading cause of death in the United States (US). The main cause of COPD is exposure to tobacco smoke and air pollutants. Problems associated with COPD include under-diagnosis of the disease and an increase in the number of smokers worldwide. The goal of our study is to identify disease variability in the gene expression profiles of COPD subjects compared to controls, by reanalyzing pre-existing, publicly available microarray expression datasets. Our inclusion criteria for microarray datasets selected for smoking status, age and sex of blood donors reported. Our datasets used Affymetrix, Agilent microarray platforms (7 datasets, 1,262 samples). We re-analyzed the curated raw microarray expression data using R packages, and used Box-Cox power transformations to normalize datasets. To identify significant differentially expressed genes we used generalized least squares models with disease state, age, sex, smoking status and study as effects that also included binary interactions, followed by likelihood ratio tests (LRT). We found 3,315 statistically significant (Storey-adjusted q-value <0.05) differentially expressed genes with respect to disease state (COPD or control). We further filtered these genes for biological effect using results from LRT q-value <0.05 and model estimates' 10% two-tailed quantiles of mean differences between COPD and control), to identify 679 genes. Through analysis of disease, sex, age, and also smoking status and disease interactions we identified differentially expressed genes involved in a variety of immune responses and cell processes in COPD. We also trained a logistic regression model using the common array genes as features, which enabled prediction of disease status with 81.7% accuracy. Our results give potential for improving the diagnosis of COPD through blood and highlight novel gene expression disease signatures.


Data Mining , Pulmonary Disease, Chronic Obstructive/epidemiology , Transcriptome/genetics , Age Factors , Air Pollutants/adverse effects , Biomarkers/metabolism , Datasets as Topic , Down-Regulation , Female , Gene Expression Profiling/statistics & numerical data , Humans , Logistic Models , Machine Learning , Male , Models, Genetic , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Pulmonary Disease, Chronic Obstructive/diagnosis , Pulmonary Disease, Chronic Obstructive/etiology , Pulmonary Disease, Chronic Obstructive/genetics , Risk Assessment/methods , Risk Factors , Sex Factors , Smoking/adverse effects , Smoking/epidemiology , United States/epidemiology , Up-Regulation
...