RESUMO
We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.
RESUMO
Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.
Assuntos
Estudo de Associação Genômica Ampla , Humanos , Medicina de Precisão , Valor Preditivo dos TestesRESUMO
Analysis of a subset of case-control sporadic breast cancer data, [from the National Cancer Institute's Cancer Genetic Markers of Susceptibility (CGEMS) initiative], focusing on 18 breast cancer-related genes with 304 SNPs, indicates that there are many interesting interactions that form two- and three-way networks in which BRCA1 plays a dominant and central role. The apparent interactions of BRCA1 with many other genes suggests the conjecture that BRCA1 serves as a protective gene and that some mutations in it or in related genes may prevent it from carrying out this protective function even if the patients are not carriers of known cancer-predisposing BRCA1 mutations. The method of analysis features the evaluation of the effect of a gene by averaging the effects of the SNPs covered by that gene. Marginal methods that test one gene at a time fail to show any effect. That may be related to the fact that each of these 18 genes adds very little to the risk of cancer. Analysis that relates the ratio of interactions to the maximum of the first-order effects discovers significant gene pairs and triplets.
Assuntos
Neoplasias da Mama/genética , Redes Reguladoras de Genes/fisiologia , Genes BRCA1/fisiologia , Proteína BRCA1/genética , Estudos de Casos e Controles , Biologia Computacional , Receptor alfa de Estrogênio/genética , Feminino , Predisposição Genética para Doença , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas p21(ras) , Proteínas Supressoras de Tumor/genética , Ubiquitina-Proteína Ligases/genética , Proteínas ras/genéticaRESUMO
Utilizing a two-dimensional tissue culture plastic screening system and a fractional factorial design, specific media formulations and growth factor combinations were determined that support human bone marrow stromal cell (BMSC) differentiation toward fibroblast characteristics for utilization in tissue engineering, specifically cell morphology and alignment, metabolic activity, abundant expression of collagen types I and III, and negligible expression of other tissue-specific markers. BMSCs were cultured for up to 14 days on tissue culture plastic, supplemented with Dulbecco's Minimal Essential Medium (DMEM)/10% FBS or Advanced DMEM(ADMEM)/5% FBS. Each medium base was supplemented with one of nine possible growth factor combinations and ascorbate-2-phosphate (Asc-2-P) for the duration of culture. ADMEM supported comparable cell viability with half the serum content of the DMEM formulation. Asc-2-P was potent in promoting BMSC proliferation, in the absence of a mitogen, supporting significant increases in cell activity over 14 days of culture. DMEM promoted significant increases in cell viability for 7 of 9 growth factor groups when compared to their ADMEM counterparts. ADMEM, however, promoted increased cell transcript and protein expression, as 5 of 9 growth factor combinations induced a 200% increase in collagen type I versus equivalent DMEM cultures. Cell morphology and collagen type I immunostaining, when assessed in context of MTT and RNA results, identified 3 growth factor and medium combinations that supported fibroblast differentiation for future development of ligament tissue in vitro.
Assuntos
Células da Medula Óssea/citologia , Diferenciação Celular/efeitos dos fármacos , Fibroblastos/citologia , Substâncias de Crescimento/farmacologia , Células Estromais/citologia , Colágeno Tipo I/genética , Colágeno Tipo III/genética , Meios de Cultura , HumanosRESUMO
An improved understanding of cellular responses during normal anterior cruciate ligament (ACL) function or repair is essential for clinical assessments, understanding ligament biology, and the implementation of tissue engineering strategies. The present study utilized quantitative real-time RT-PCR combined with univariate and multivariate statistical analyses to establish a quantitative database of marker transcript expression that can provide a "blueprint" of ACL wound healing. Selected markers (collagen types I and III, biglycan, decorin, MMP-1, MMP-2, MMP-9, and TIMP-1) were assessed from 33 torn ACLs harvested during reconstructive surgery. Trends were observed between postinjury period and marker expressions. Significant correlations between marker expression existed and were most prominent between collagen types I and III. Canonical correlation analysis established a relationship between patient demographics and a combination of all marker expressions. The currently observed trends and correlations may assist in identifying appropriate tissue samples and provide a baseline information of marker expression level that can support in vitro optimization of environmental cues for ligament tissue engineering application.