RESUMO
BACKGROUND: Molecular heterogeneity of tumors suggests the presence of multiple different subclones that may limit response to targeted therapies and contribute to acquisition of drug resistance, but its quantification has remained challenging. RESULTS: We performed simulations to evaluate statistical measures that best capture the molecular diversity within a group of tumors for either continuous (gene expression) or discrete (mutations, copy number alterations) molecular data. Dispersion based metrics in the principal component space best captured the underlying heterogeneity. To demonstrate utility of these measures, we characterized the diversity in transcriptional and genomic profiles of different breast tumor subtypes, and showed that basal-like or triple-negative breast cancers (TNBC) are significantly more heterogeneous molecularly than other subtypes. Our analysis also suggests that transcriptional diversity is a global characteristic of the tumors observed across the majority of molecular pathways. Among basal-like tumors, those that were resistant to multi-agent chemotherapy showed greater transcriptional diversity compared to chemotherapy-sensitive tumors, suggesting that potentially multiple mechanisms may be contributing to chemotherapy resistance. CONCLUSIONS: We proposed and validated measures of transcriptional and genomic diversity that can quantify the molecular diversity of tumors. We applied the new measures to genomic data from breast tumors and demonstrated that basal-like breast cancers are significantly more diverse than other breast cancers. The observation that chemo-resistant tumors are significantly more diverse molecularly than chemosensitive tumors implies that multiple resistance mechanisms may be active, thus limiting the sensitivity and accuracy of predictive markers of chemotherapy response.
Assuntos
Neoplasias da Mama/genética , Genes Neoplásicos , Antineoplásicos/uso terapêutico , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/patologia , Variações do Número de Cópias de DNA , Bases de Dados Genéticas/estatística & dados numéricos , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Análise de Sequência com Séries de OligonucleotídeosRESUMO
The purpose was to compare logistic regression model (LRM) and recursive partitioning (RP) to predict pathologic complete response to preoperative chemotherapy in patients with breast cancer. The two models were built in a same training set of 496 patients and validated in a same validation set of 337 patients. Model performance was quantified with respect to discrimination (evaluated by the areas under the receiver operating characteristics curves (AUC)) and calibration. In the training set, AUC were similar for LRM and RP models (0.77 (95% confidence interval, 0.74-0.80) and 0.75 (95% CI, 0.74-0.79), respectively) while LRM outperformed RP in the validation set (0.78 (95% CI, 0.74-0.82) versus 0.64 (95% CI, 0.60-0.67). LRM model also outperformed RP model in term of calibration. In these real datasets, LRM model outperformed RP model. It is therefore more suitable for clinical use.
Assuntos
Antineoplásicos/uso terapêutico , Neoplasias da Mama/tratamento farmacológico , Modelos Estatísticos , Área Sob a Curva , Neoplasias da Mama/patologia , Quimioterapia Adjuvante , Feminino , Humanos , Modelos Logísticos , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Curva ROC , Resultado do TratamentoRESUMO
If the benefit of adjuvant chemotherapy may be determined at the level of a population, to determine the real chemosensitivity of a tumor at the individual level is impossible. The concept of neoadjuvant chemotherapy in patients with localized breast cancer is interesting because it helps to know the chemosensitivity of a tumor "in vivo". It is possible to use a single criterion to predict the effectiveness of targeted therapies. The chemotherapy is not a targeted therapy, and to determine a biological predictive marker of the response has been impossible so far. The development of mathematical models and use of molecular biology may help to predict chemosensitivity. Initial results are promising. The validation of published works is necessary, but applications are numerous.
Assuntos
Antineoplásicos/uso terapêutico , Neoplasias da Mama/tratamento farmacológico , Quimioterapia Adjuvante , Feminino , HumanosRESUMO
BACKGROUND: DNA microarray technology has emerged as a major tool for exploring cancer biology and solving clinical issues. Predicting a patient's response to chemotherapy is one such issue; successful prediction would make it possible to give patients the most appropriate chemotherapy regimen. Patient response can be classified as either a pathologic complete response (PCR) or residual disease (NoPCR), and these strongly correlate with patient outcome. Microarrays can be used as multigenic predictors of patient response, but probe selection remains problematic. In this study, each probe set was considered as an elementary predictor of the response and was ranked on its ability to predict a high number of PCR and NoPCR cases in a ratio similar to that seen in the learning set. We defined a valuation function that assigned high values to probe sets according to how different the expression of the genes was and to how closely the relative proportions of PCR and NoPCR predictions to the proportions observed in the learning set was. Multigenic predictors were designed by selecting probe sets highly ranked in their predictions and tested using several validation sets. RESULTS: Our method defined three types of probe sets: 71% were mono-informative probe sets (59% predicted only NoPCR, and 12% predicted only PCR), 25% were bi-informative, and 4% were non-informative. Using a valuation function to rank the probe sets allowed us to select those that correctly predicted the response of a high number of patient cases in the training set and that predicted a PCR/NoPCR ratio for validation sets that was similar to that of the whole learning set. Based on DLDA and the nearest centroid method, bi-informative probes proved more successful predictors than probes selected using a t test. CONCLUSION: Prediction of the response to breast cancer preoperative chemotherapy was significantly improved by selecting DNA probe sets that were successful in predicting outcomes for the entire learning set, both in terms of accurately predicting a high number of cases and in correctly predicting the ratio of PCR to NoPCR cases.
Assuntos
Antineoplásicos/uso terapêutico , Biomarcadores Tumorais/genética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Sondas de DNA/genética , Proteínas de Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Avaliação de Resultados em Cuidados de Saúde/métodos , Neoplasias da Mama/diagnóstico , Feminino , Perfilação da Expressão Gênica/métodos , Humanos , Cuidados Pré-Operatórios/métodos , Prognóstico , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Resultado do TratamentoRESUMO
This work proposes a sequential methodology for selecting variables in classification problems in which the number of predictors is much larger than the sample size. The methodology includes a Monte Carlo permutation procedure that conditionally tests the null hypothesis of no association among the outcomes and the available predictors. In order to improve computing aspects, we propose a new parametric distribution, the Truncated and Zero Inflated Gumbel Distribution. The final application is to find compact classification models with improved performance for genomic data. Results using real data sets show that the proposed methodology selects compact models with optimized classification performances.
Assuntos
Genômica/estatística & dados numéricos , Algoritmos , Bioestatística/métodos , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Modelos Estatísticos , Método de Monte Carlo , Análise Multivariada , Tamanho da AmostraRESUMO
BACKGROUND: Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly ever explicit. METHOD: We addressed the computation of molecular signatures by searching the optima of a bi-objective function whose solution space was the set of all possible molecular signatures, ie, the set of subsets of genes. The two objectives were the size of the signature-to be minimized-and the interclass distance induced by the signature-to be maximized-. RESULTS: We showed that: 1) the convex combination of the two objectives had exactly n optimal non empty signatures where n was the number of genes, 2) the n optimal signatures were nested, and 3) the optimal signature of size k was the subset of k top ranked genes that contributed the most to the interclass distance. We applied our feature selection method on five public datasets in oncology, and assessed the prediction performances of the optimal signatures as input to the diagonal linear discriminant analysis (DLDA) classifier. They were at the same level or better than the best-reported ones. The predictions were robust, and the signatures were almost always significantly smaller. We studied in more details the performances of our predictive modeling on two breast cancer datasets to predict the response to a preoperative chemotherapy: the performances were higher than the previously reported ones, the signatures were three times smaller (11 versus 30 gene signatures), and the genes member of the signature were known to be involved in the response to chemotherapy. CONCLUSIONS: Defining molecular signatures as the optima of a bi-objective function that combined the signature size and the interclass distance was well founded and efficient for prediction in oncogenomics. The complexity of the computation was very low because the optimal signatures were the sets of genes in the ranking of their valuation. Software can be freely downloaded from http://gardeux-vincent.eu/DeltaRanking.php.
RESUMO
In this paper we propose an application of local statistical models to the problem of identifying patients with pathologic complete response (PCR) to neoadjuvant chemotherapy. The idea of using local models is to split the input space (with data from PCR and NoPCR patients) and build a model for each partition. After the construction of the models we used bayesian classifiers and logistic regression to classify patients in the two classes.
Assuntos
Neoplasias da Mama/mortalidade , Neoplasias da Mama/terapia , Quimioterapia Adjuvante/mortalidade , Terapia Neoadjuvante/mortalidade , Avaliação de Resultados em Cuidados de Saúde/métodos , Modelos de Riscos Proporcionais , Algoritmos , Brasil/epidemiologia , Prevalência , Prognóstico , Reprodutibilidade dos Testes , Medição de Risco/métodos , Fatores de Risco , Sensibilidade e Especificidade , Análise de Sobrevida , Taxa de SobrevidaRESUMO
INTRODUCTION: Function induction problems are frequently represented by affinity measures between the elements of the inductive sample set, and kernel matrices are a well-known example of affinity measures. METHODS: The objective of the present work is to obtain information about the relations between data from a calculated kernel matrix by initially assuming that those geometric relations are consistent with known labels. To assess the relation between the data structure and the labels, a classifier based on kernel density estimation (KDE) was used. The performance of the selected width using the method presented in this paper was compared to the performance of a method described in the literature; the literature method was based on minimizing error minimization and balancing bias and variance. The main case study, which was to predict the response to neoadjuvant chemotherapy treatment, consists of evaluating whether a set of training data from genomic expression data from breast tumors and the genomic expression from the tumor of one patient can be used to determine whether there will be a pathological complete response. RESULTS: For the tested databases, the proposed method showed statistically equivalent results with the literature method; however, in some cases, the proposed method had a better overall performance when considering both large and small classes. CONCLUSION: The results demonstrate the feasibility of selecting models by directly calculating densities and the geometry from the class separation.
RESUMO
New concepts may prove necessary to profit from the avalanche of sequence data on the genome, transcriptome, proteome and interactome and to relate this information to cell physiology. Here, we focus on the concept of large activity-based structures, or hyperstructures, in which a variety of types of molecules are brought together to perform a function. We review the evidence for the existence of hyperstructures responsible for the initiation of DNA replication, the sequestration of newly replicated origins of replication, cell division and for metabolism. The processes responsible for hyperstructure formation include changes in enzyme affinities due to metabolite-induction, lipid-protein affinities, elevated local concentrations of proteins and their binding sites on DNA and RNA, and transertion. Experimental techniques exist that can be used to study hyperstructures and we review some of the ones less familiar to biologists. Finally, we speculate on how a variety of in silico approaches involving cellular automata and multi-agent systems could be combined to develop new concepts in the form of an Integrated cell (I-cell) which would undergo selection for growth and survival in a world of artificial microbiology.