RESUMO
BACKGROUND: We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. METHODS: To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. RESULTS: We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. CONCLUSIONS: A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica/genética , Transcriptoma/genética , Algoritmos , Neoplasias da Mama/classificação , Neoplasias da Mama/patologia , Análise por Conglomerados , Feminino , HumanosRESUMO
DNA methylation is a common epigenetic marker that regulates gene expression. A robust and cost-effective way for measuring whole genome methylation is Methyl-CpG binding domain-based capture followed by sequencing (MBDCap-seq). In this study, we proposed BIMMER, a Hidden Markov Model (HMM) for differential Methylation Regions (DMRs) identification, where HMMs were proposed to model the methylation status in normal and cancer samples in the first layer and another HMM was introduced to model the relationship between differential methylation and methylation statuses in normal and cancer samples. To carry out the prediction for BIMMER, an Expectation-Maximization algorithm was derived. BIMMER was validated on the simulated data and applied to real MBDCap-seq data of normal and cancer samples. BIMMER revealed that 8.83% of the breast cancer genome are differentially methylated and the majority are hypo-methylated in breast cancer.
Assuntos
Algoritmos , Metilação de DNA , Neoplasias da Mama/genética , Feminino , Genoma Humano , Humanos , Cadeias de Markov , Análise de Sequência de DNARESUMO
BACKGROUND: Connectivity map (cMap) is a recent developed dataset and algorithm for uncovering and understanding the treatment effect of small molecules on different cancer cell lines. It is widely used but there are still remaining challenges for accurate predictions. METHOD: Here, we propose BRCA-MoNet, a network of drug mode of action (MoA) specific to breast cancer, which is constructed based on the cMap dataset. A drug signature selection algorithm fitting the characteristic of cMap data, a quality control scheme as well as a novel query algorithm based on BRCA-MoNet are developed for more effective prediction of drug effects. RESULT: BRCA-MoNet was applied to three independent data sets obtained from the GEO database: Estrodial treated MCF7 cell line, BMS-754807 treated MCF7 cell line, and a breast cancer patient microarray dataset. In the first case, BRCA-MoNet could identify drug MoAs likely to share same and reverse treatment effect. In the second case, the result demonstrated the potential of BRCA-MoNet to reposition drugs and predict treatment effects for drugs not in cMap data. In the third case, a possible procedure of personalized drug selection is showcased. CONCLUSIONS: The results clearly demonstrated that the proposed BRCA-MoNet approach can provide increased prediction power to cMap and thus will be useful for identification of new therapeutic candidates.