Búsqueda | Portal de Búsqueda de la BVS

1.

Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis.

Cao, Xueyuan; Pounds, Stan.

BMC Bioinformatics ; 22(1): 207, 2021 Apr 21.

Artículo en Inglés | MEDLINE | ID: mdl-33882829

RESUMEN

BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. RESULTS: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. CONCLUSION: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA .

Asunto(s)

Perfilación de la Expresión Génica , Pruebas Genéticas , Niño , Estudios de Cohortes , Simulación por Computador , Humanos , Fenotipo

2.

Molecular heterogeneity and CXorf67 alterations in posterior fossa group A (PFA) ependymomas.

Pajtler, Kristian W; Wen, Ji; Sill, Martin; Lin, Tong; Orisme, Wilda; Tang, Bo; Hübner, Jens-Martin; Ramaswamy, Vijay; Jia, Sujuan; Dalton, James D; Haupfear, Kelly; Rogers, Hazel A; Punchihewa, Chandanamali; Lee, Ryan; Easton, John; Wu, Gang; Ritzmann, Timothy A; Chapman, Rebecca; Chavez, Lukas; Boop, Fredrick A; Klimo, Paul; Sabin, Noah D; Ogg, Robert; Mack, Stephen C; Freibaum, Brian D; Kim, Hong Joo; Witt, Hendrik; Jones, David T W; Vo, Baohan; Gajjar, Amar; Pounds, Stan; Onar-Thomas, Arzu; Roussel, Martine F; Zhang, Jinghui; Taylor, J Paul; Merchant, Thomas E; Grundy, Richard; Tatevossian, Ruth G; Taylor, Michael D; Pfister, Stefan M; Korshunov, Andrey; Kool, Marcel; Ellison, David W.

Acta Neuropathol ; 136(2): 211-226, 2018 08.

Artículo en Inglés | MEDLINE | ID: mdl-29909548

RESUMEN

Of nine ependymoma molecular groups detected by DNA methylation profiling, the posterior fossa type A (PFA) is most prevalent. We used DNA methylation profiling to look for further molecular heterogeneity among 675 PFA ependymomas. Two major subgroups, PFA-1 and PFA-2, and nine minor subtypes were discovered. Transcriptome profiling suggested a distinct histogenesis for PFA-1 and PFA-2, but their clinical parameters were similar. In contrast, PFA subtypes differed with respect to age at diagnosis, gender ratio, outcome, and frequencies of genetic alterations. One subtype, PFA-1c, was enriched for 1q gain and had a relatively poor outcome, while patients with PFA-2c ependymomas showed an overall survival at 5 years of > 90%. Unlike other ependymomas, PFA-2c tumors express high levels of OTX2, a potential biomarker for this ependymoma subtype with a good prognosis. We also discovered recurrent mutations among PFA ependymomas. H3 K27M mutations were present in 4.2%, occurring only in PFA-1 tumors, and missense mutations in an uncharacterized gene, CXorf67, were found in 9.4% of PFA ependymomas, but not in other groups. We detected high levels of wildtype or mutant CXorf67 expression in all PFA subtypes except PFA-1f, which is enriched for H3 K27M mutations. PFA ependymomas are characterized by lack of H3 K27 trimethylation (H3 K27-me3), and we tested the hypothesis that CXorf67 binds to PRC2 and can modulate levels of H3 K27-me3. Immunoprecipitation/mass spectrometry detected EZH2, SUZ12, and EED, core components of the PRC2 complex, bound to CXorf67 in the Daoy cell line, which shows high levels of CXorf67 and no expression of H3 K27-me3. Enforced reduction of CXorf67 in Daoy cells restored H3 K27-me3 levels, while enforced expression of CXorf67 in HEK293T and neural stem cells reduced H3 K27-me3 levels. Our data suggest that heterogeneity among PFA ependymomas could have clinicopathologic utility and that CXorf67 may have a functional role in these tumors.

Asunto(s)

Ependimoma/genética , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias Infratentoriales/genética , Mutación/genética , Proteínas Oncogénicas/genética , Metilación de ADN , Ependimoma/clasificación , Ependimoma/patología , Femenino , Perfilación de la Expresión Génica , Células HEK293 , Histonas/genética , Humanos , Neoplasias Infratentoriales/clasificación , Neoplasias Infratentoriales/patología , Masculino , Transfección

3.

Genetics of pleiotropic effects of dexamethasone.

Ramsey, Laura B; Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Yang, Wenjian; Smith, Colton; Karol, Seth E; Liu, Chengcheng; Panetta, John C; Inaba, Hiroto; Rubnitz, Jeffrey E; Metzger, Monika L; Ribeiro, Raul C; Sandlund, John T; Jeha, Sima; Pui, Ching-Hon; Evans, William E; Relling, Mary V.

Pharmacogenet Genomics ; 27(8): 294-302, 2017 08.

Artículo en Inglés | MEDLINE | ID: mdl-28628558

RESUMEN

OBJECTIVES: Glucocorticoids such as dexamethasone have pleiotropic effects, including desired antileukemic, anti-inflammatory, or immunosuppressive effects, and undesired metabolic or toxic effects. The most serious adverse effects of dexamethasone among patients with acute lymphoblastic leukemia are osteonecrosis and thrombosis. To identify inherited genomic variation involved in these severe adverse effects, we carried out genome-wide association studies (GWAS) by analyzing 14 pleiotropic glucocorticoid phenotypes in 391 patients with acute lymphoblastic leukemia. PATIENTS AND METHODS: We used the Projection Onto the Most Interesting Statistical Evidence integrative analysis technique to identify genetic variants associated with pleiotropic dexamethasone phenotypes, stratifying for age, sex, race, and treatment, and compared the results with conventional single-phenotype GWAS. The phenotypes were osteonecrosis, central nervous system toxicity, hyperglycemia, hypokalemia, thrombosis, dexamethasone exposure, BMI, growth trajectory, and levels of cortisol, albumin, and asparaginase antibodies, and changes in cholesterol, triglycerides, and low-density lipoproteins after dexamethasone. RESULTS: The integrative analysis identified more pleiotropic single nucleotide polymorphism variants (P=1.46×10(-215), and these variants were more likely to be in gene-regulatory regions (P=1.22×10(-6)) than traditional single-phenotype GWAS. The integrative analysis yielded genomic variants (rs2243057 and rs6453253) in F2RL1, a receptor that functions in hemostasis, thrombosis, and inflammation, which were associated with pleiotropic effects, including osteonecrosis and thrombosis, and were in regulatory gene regions. CONCLUSION: The integrative pleiotropic analysis identified risk variants for osteonecrosis and thrombosis not identified by single-phenotype analysis that may have importance for patients with underlying sensitivity to multiple dexamethasone adverse effects.

Asunto(s)

Biología Computacional/métodos , Dexametasona/efectos adversos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/genética , Glucocorticoides/efectos adversos , Polimorfismo de Nucleótido Simple , Leucemia-Linfoma Linfoblástico de Células Precursoras/tratamiento farmacológico , Femenino , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Fenotipo , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Estudios Prospectivos , Receptor PAR-2 , Receptores Acoplados a Proteínas G/genética

4.

The most informative spacing test effectively discovers biologically relevant outliers or multiple modes in expression.

Pawlikowska, Iwona; Wu, Gang; Edmonson, Michael; Liu, Zhifa; Gruber, Tanja; Zhang, Jinghui; Pounds, Stan.

Bioinformatics ; 30(10): 1400-8, 2014 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-24458951

RESUMEN

SUMMARY: Several outlier and subgroup identification statistics (OASIS) have been proposed to discover transcriptomic features with outliers or multiple modes in expression that are indicative of distinct biological processes or subgroups. Here, we borrow ideas from the OASIS methods in the bioinformatics and statistics literature to develop the 'most informative spacing test' (MIST) for unsupervised detection of such transcriptomic features. In an example application involving 14 cases of pediatric acute megakaryoblastic leukemia, MIST more robustly identified features that perfectly discriminate subjects according to gender or the presence of a prognostically relevant fusion-gene than did seven other OASIS methods in the analysis of RNA-seq exon expression, RNA-seq exon junction expression and micorarray exon expression data. MIST was also effective at identifying features related to gender or molecular subtype in an example application involving 157 adult cases of acute myeloid leukemia. AVAILABILITY: MIST will be freely available in the OASIS R package at http://www.stjuderesearch.org/site/depts/biostats CONTACT: stanley.pounds@stjude.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Análisis de Secuencia de ARN/métodos , Adulto , Biometría , Niño , Exones , Femenino , Expresión Génica , Humanos , Leucemia Mieloide Aguda/genética , Masculino , Modelos Genéticos

5.

An R package that automatically collects and archives details for reproducible computing.

Liu, Zhifa; Pounds, Stan.

BMC Bioinformatics ; 15: 138, 2014 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-24886202

RESUMEN

BACKGROUND: It is scientifically and ethically imperative that the results of statistical analysis of biomedical research data be computationally reproducible in the sense that the reported results can be easily recapitulated from the study data. Some statistical analyses are computationally a function of many data files, program files, and other details that are updated or corrected over time. In many applications, it is infeasible to manually maintain an accurate and complete record of all these details about a particular analysis. RESULTS: Therefore, we developed the rctrack package that automatically collects and archives read only copies of program files, data files, and other details needed to computationally reproduce an analysis. CONCLUSIONS: The rctrack package uses the trace function to temporarily embed detail collection procedures into functions that read files, write files, or generate random numbers so that no special modifications of the primary R program are necessary. At the conclusion of the analysis, rctrack uses these details to automatically generate a read only archive of data files, program files, result files, and other details needed to recapitulate the analysis results. Information about this archive may be included as an appendix of a report generated by Sweave or knitR. Here, we describe the usage, implementation, and other features of the rctrack package. The rctrack package is freely available from http://www.stjuderesearch.org/site/depts/biostats/rctrack under the GPL license.

Asunto(s)

Interpretación Estadística de Datos , Programas Informáticos , Procesamiento Automatizado de Datos , Reproducibilidad de los Resultados

6.

A genomic random interval model for statistical analysis of genomic lesion data.

Pounds, Stan; Cheng, Cheng; Li, Shaoyu; Liu, Zhifa; Zhang, Jinghui; Mullighan, Charles.

Bioinformatics ; 29(17): 2088-95, 2013 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-23842812

RESUMEN

MOTIVATION: Tumors exhibit numerous genomic lesions such as copy number variations, structural variations and sequence variations. It is difficult to determine whether a specific constellation of lesions observed across a cohort of multiple tumors provides statistically significant evidence that the lesions target a set of genes that may be located across different chromosomes but yet are all involved in a single specific biological process or function. RESULTS: We introduce the genomic random interval (GRIN) statistical model and analysis method that evaluates the statistical significance of the abundance of genomic lesions that overlap a specific locus or a pre-defined set of biologically related loci. The GRIN model retains certain biologically important properties of genomic lesions that are ignored by other methods. In a simulation study and two example analyses of leukemia genomic lesion data, GRIN more effectively identified important loci as significant than did three methods based on a permutation-of-markers model. GRIN also identified biologically relevant pathways with a significant abundance of lesions in both examples. AVAILABILITY: An R package will be freely available at CRAN and www.stjuderesearch.org/site/depts/biostats/software.

Asunto(s)

Variación Genética , Modelos Estadísticos , Neoplasias/genética , Variaciones en el Número de Copia de ADN , Sitios Genéticos , Genómica/métodos , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética

7.

A procedure to statistically evaluate agreement of differential expression for cross-species genomics.

Pounds, Stan; Gao, Cuilan Lani; Johnson, Robert A; Wright, Karen D; Poppleton, Helen; Finkelstein, David; Leary, Sarah E S; Gilbertson, Richard J.

Bioinformatics ; 27(15): 2098-103, 2011 Aug 01.

Artículo en Inglés | MEDLINE | ID: mdl-21697127

RESUMEN

MOTIVATION: Animal models play a pivotal role in translation biomedical research. The scientific value of an animal model depends on how accurately it mimics the human disease. In principle, microarrays collect the necessary data to evaluate the transcriptomic fidelity of an animal model in terms of the similarity of expression with the human disease. However, statistical methods for this purpose are lacking. RESULTS: We develop the agreement of differential expression (AGDEX) procedure to measure and determine the statistical significance of the similarity of the results of two experiments that measure differential expression across two groups. AGDEX defines a metric of agreement and determines statistical significance by permutation of each experiment's group labels. Additionally, AGDEX performs a comprehensive permutation-based analysis of differential expression for each experiment, including gene-set analyses and meta-analytic integration of results across studies. As an example, we show how AGDEX was recently used to evaluate the similarity of the transcriptome of a novel model of the brain tumor ependymoma in mice to that of a subtype of the human disease. This result, combined with other observations, helped us to infer the cell of origin of this devastating human cancer. AVAILABILITY: An R package is currently available from www.stjuderesearch.org/site/depts/biostats/agdex and will shortly be available from www.bioconductor.org.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Genómica/métodos , Modelos Estadísticos , Animales , Biología Computacional/métodos , Modelos Animales de Enfermedad , Ependimoma/genética , Humanos , Metaanálisis como Asunto , Ratones

8.

Reference alignment of SNP microarray signals for copy number analysis of tumors.

Pounds, Stan; Cheng, Cheng; Mullighan, Charles; Raimondi, Susana C; Shurtleff, Sheila; Downing, James R.

Bioinformatics ; 25(3): 315-21, 2009 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-19052058

RESUMEN

UNLABELLED: A new procedure to align single nucleotide polymorphism (SNP) microarray signals for copy number analysis is proposed. For each individual array, this reference alignment procedure (RAP) uses a set of selected markers as internal references to direct the signal alignment. RAP aligns the signals so that each array has a similar signal distribution among its reference markers. An accompanying reference selection algorithm (RSA) uses genotype calls and initial signal intensities to choose two-copy markers as the internal references for each array. After RSA and RAP are applied, each array has a similar distribution of signals of two-copy markers so that across-array signal comparisons are biologically meaningful. An upper bound for a statistical metric of signal misalignment is derived and provides a theoretical basis to choose RSA-RAP over other alignment procedures for copy number analysis of cancers. In our study of acute lymphoblastic leukemia, RSA-RAP gives copy number analysis results that show substantially better concordance with cytogenetics than do two other alignment procedures. AVAILABILITY: Documented R code is freely available from www.stjuderesearch.org/depts/biostats/refnorm.

Asunto(s)

Aberraciones Cromosómicas , Biología Computacional/métodos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Biomarcadores de Tumor/análisis , Biomarcadores de Tumor/genética , Análisis por Conglomerados , Dosificación de Gen , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Alineación de Secuencia/métodos

9.

PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C; Downing, James R; Lamba, Jatinder.

Bioinformatics ; 25(16): 2013-9, 2009 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-19528086

RESUMEN

MOTIVATION: In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. RESULTS: Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. AVAILABILITY: Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.

Asunto(s)

Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Biometría , Perfilación de la Expresión Génica/métodos

10.

Assumption Adequacy Averaging as a Concept to Develop More Robust Methods for Differential Gene Expression Analysis.

Pounds, Stan; Rai, Shesh N.

Comput Stat Data Anal ; 53(5): 1604-1612, 2009 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-20161327

RESUMEN

The concept of assumption adequacy averaging is introduced as a technique to develop more robust methods that incorporate assessments of assumption adequacy into the analysis. The concept is illustrated by using it to develop a method that averages results from the t-test and nonparametric rank-sum test with weights obtained from using the Shapiro-Wilk test to test the assumption of normality. Through this averaging process, the proposed method is able to rely more heavily on the statistical test that the data suggests is superior for each individual gene. Subsequently, this method developed by assumption adequacy averaging outperforms its two component methods (the t-test and rank-sum test) in a series of traditional and bootstrap-based simulation studies. The proposed method showed greater concordance in gene selection across two studies of gene expression in acute myeloid leukemia than did the t-test or rank-sum test. An R routine to implement the method is available upon request.

11.

Statistical analysis of data from retroviral clonal experiments in the developing retina.

Pounds, Stan; Dyer, Michael A.

Brain Res ; 1192: 178-85, 2008 Feb 04.

Artículo en Inglés | MEDLINE | ID: mdl-17950704

RESUMEN

Retroviral lineage studies have been widely used over the past decade to study retinal development in vivo and in explant culture [Donovan S.L., Dyer, M.A., 2006. Preparation and Square Wave Electroporation of Retinal Explant Cultures, Nature Protocols 1, 2710-2718; Donovan, S.L., Schweers, B., Martins, R., Johnson D., Dyer, M.A., 2001. Compensation by tumor suppressor genes during retinal development in mice and humans, BMC Biol 4 , 14; Dyer M.A., Cepko, C.L., 2001. p27Kip1 and p57Kip2 regulate proliferation in distinct retinal progenitor cell populations, J. of Neurosci 21, 4259-4271; Dyer M.A., Cepko, C.L., 2000. p57(Kip2) regulates progenitor cell proliferation and amacrine interneuron development in the mouse retina, Development 127, 3593-3605; Dyer, M.A., Livesey, F.J., Cepko C.L., Oliver, G., 2003. Prox1 function controls progenitor cell proliferation and horizontal cell genesis in the mammalian retina, Nat Genet 34, 53-58]. These approaches can provide important data on the proliferation, cell fate specification, differentiation and survival of individual neurons and glia derived from single infected retinal progenitor cells. In some experiments, these parameters are compared in retinae from animals with different targeted deletions or transgenes. Alternatively, the effect of ectopic expression of virally encoded transgenes may be studied at the level of individual retinal progenitor cells in vivo and in explant culture. One of the challenges with interpreting retroviral lineage studies is determining the statistical significance of differences in the proliferation, cell fate specification, differentiation of survival of retinal progenitor cells between experimental and control samples. In this study, we provide a clear step-by-step guide to the application of statistical methods to retroviral lineage analyses actual data sets. We anticipate that this will serve as a guide for future statistical analyses of retroviral lineage studies and will help to provide a uniform standard in the field.

Asunto(s)

Diferenciación Celular/genética , Linaje de la Célula/genética , Interpretación Estadística de Datos , Retina/embriología , Retina/metabolismo , Retroviridae/genética , Células Madre/metabolismo , Animales , Recuento de Células , Regulación del Desarrollo de la Expresión Génica/genética , Vectores Genéticos/genética , Humanos , Retina/citología , Tamaño de la Muestra , Células Madre/citología , Transgenes/genética

12.

Robust estimation of the false discovery rate.

Pounds, Stan; Cheng, Cheng.

Bioinformatics ; 22(16): 1979-87, 2006 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-16777905

RESUMEN

MOTIVATION: Presently available methods that use p-values to estimate or control the false discovery rate (FDR) implicitly assume that p-values are continuously distributed and based on two-sided tests. Therefore, it is difficult to reliably estimate the FDR when p-values are discrete or based on one-sided tests. RESULTS: A simple and robust method to estimate the FDR is proposed. The proposed method does not rely on implicit assumptions that tests are two-sided or yield continuously distributed p-values. The proposed method is proven to be conservative and have desirable large-sample properties. In addition, the proposed method was among the best performers across a series of 'real data simulations' comparing the performance of five currently available methods. AVAILABILITY: Libraries of S-plus and R routines to implement the method are freely available from www.stjuderesearch.org/depts/biostats.

Asunto(s)

Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Simulación por Computador , Interpretación Estadística de Datos , Reacciones Falso Positivas , Análisis por Micromatrices , Modelos Genéticos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reconocimiento de Normas Patrones Automatizadas , Reproducibilidad de los Resultados , Programas Informáticos

13.

Statistical development and evaluation of microarray gene expression data filters.

Pounds, Stan; Cheng, Cheng.

J Comput Biol ; 12(4): 482-95, 2005 May.

Artículo en Inglés | MEDLINE | ID: mdl-15882143

RESUMEN

Filtering is a common practice used to simplify the analysis of microarray data by removing from subsequent consideration probe sets believed to be unexpressed. The m/n filter, which is widely used in the analysis of Affymetrix data, removes all probe sets having fewer than m present calls among a set of n chips. The m/n filter has been widely used without considering its statistical properties. The level and power of the m/n filter are derived. Two alternative filters, the pooled p-value filter and the error-minimizing pooled p-value filter are proposed. The pooled p-value filter combines information from the present-absent p-values into a single summary p-value which is subsequently compared to a selected significance threshold. We show that pooled p-value filter is the uniformly most powerful statistical test under a reasonable beta model and that it exhibits greater power than the m/n filter in all scenarios considered in a simulation study. The error-minimizing pooled p-value filter compares the summary p-value with a threshold determined to minimize a total-error criterion based on a partition of the distribution of all probes' summary p-values. The pooled p-value and error-minimizing pooled p-value filters clearly perform better than the m/n filter in a case-study analysis. The case-study analysis also demonstrates a proposed method for estimating the number of differentially expressed probe sets excluded by filtering and subsequent impact on the final analysis. The filter impact analysis shows that the use of even the best filter may hinder, rather than enhance, the ability to discover interesting probe sets or genes. S-plus and R routines to implement the pooled p-value and error-minimizing pooled p-value filters have been developed and are available from www.stjuderesearch.org/depts/biostats/index.html.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis por Micromatrices/métodos , Análisis por Micromatrices/estadística & datos numéricos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Humanos

14.

Erratum: sample size determination for the false discovery rate.

Pounds, Stan; Cheng, Cheng.

Bioinformatics ; 25(5): 698-9, 2009 Mar 01.

Artículo en Inglés | MEDLINE | ID: mdl-19255981

Asunto(s)

Biología Computacional/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacciones Falso Positivas , Perfilación de la Expresión Génica , Tamaño de la Muestra

15.

Integrated analysis of pharmacologic, clinical and SNP microarray data using Projection Onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing.

Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun J; Campana, Dario; Pui, Ching-Hon; Evans, William E; Relling, Mary V.

Int J Data Min Bioinform ; 5(2): 143-57, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21516175

RESUMEN

We recently developed the Projection Onto the Most Interesting Statistical Evidence (PROMISE) procedure that uses prior biological knowledge to guide an integrated analysis of gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical and genome-wide genotype data. An efficient permutation-testing algorithm is introduced so that PROMISE is computationally feasible in this higher-dimension setting. In the analysis of a paediatric leukaemia data set, PROMISE effectively identifies genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables.

Asunto(s)

Algoritmos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Biología Computacional , Interpretación Estadística de Datos , Minería de Datos , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Leucemia-Linfoma Linfoblástico de Células Precursoras/tratamiento farmacológico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Programas Informáticos

16.

False discovery rate paradigms for statistical analyses of microarray gene expression data.

Cheng, Cheng; Pounds, Stan.

Bioinformation ; 1(10): 436-46, 2007 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-17597936

RESUMEN

The microarray gene expression applications have greatly stimulated the statistical research on the massive multiple hypothesis tests problem. There is now a large body of literature in this area and basically five paradigms of massive multiple tests: control of the false discovery rate (FDR), estimation of FDR, significance threshold criteria, control of family-wise error rate (FWER) or generalized FWER (gFWER), and empirical Bayes approaches. This paper contains a technical survey of the developments of the FDR-related paradigms, emphasizing precise formulation of the problem, concepts of error measurements, and considerations in applications. The goal is not to do an exhaustive literature survey, but rather to review the current state of the field.

17.

Sample size determination for the false discovery rate.

Pounds, Stan; Cheng, Cheng.

Bioinformatics ; 21(23): 4263-71, 2005 Dec 01.

Artículo en Inglés | MEDLINE | ID: mdl-16204346

RESUMEN

MOTIVATION: There is not a widely applicable method to determine the sample size for experiments basing statistical significance on the false discovery rate (FDR). RESULTS: We propose and develop the anticipated FDR (aFDR) as a conceptual tool for determining sample size. We derive mathematical expressions for the aFDR and anticipated average statistical power. These expressions are used to develop a general algorithm to determine sample size. We provide specific details on how to implement the algorithm for a k-group (k > or = 2) comparisons. The algorithm performs well for k-group comparisons in a series of traditional simulations and in a real-data simulation conducted by resampling from a large, publicly available dataset. AVAILABILITY: Documented S-plus and R code libraries are freely available from www.stjuderesearch.org/depts/biostats.

Asunto(s)

Biología Computacional/métodos , Algoritmos , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Reacciones Falso Positivas , Perfilación de la Expresión Génica , Modelos Genéticos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Tamaño de la Muestra , Programas Informáticos

18.

Severe cardiopulmonary complications consistent with systemic inflammatory response syndrome caused by leukemia cell lysis in childhood acute myelomonocytic or monocytic leukemia.

Hijiya, Nobuko; Metzger, Monika L; Pounds, Stan; Schmidt, Jeffrey E; Razzouk, Bassem I; Rubnitz, Jeffrey E; Howard, Scott C; Nunez, Cesar A; Pui, Ching-Hon; Ribeiro, Raul C.

Pediatr Blood Cancer ; 44(1): 63-9, 2005 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-15368547

RESUMEN

BACKGROUND: Life-threatening pulmonary complications that coincide with cell lysis during early chemotherapy and that mimic systemic inflammatory response syndrome (SIRS) have been reported in patients with acute myeloid leukemia (AML). METHODS: We reviewed the records of patients with de novo AML, excluding M3 and Down syndrome, treated at our institution between 1991 and 2002 to determine the prevalence of severe SIRS with grade 3/4 pulmonary complications and to identify AML subtypes associated with severe SIRS. To examine the role of cell lysis, we compared leukocyte reduction in AML subtypes affected by severe SIRS with that in unaffected subtypes. RESULTS: Of 155 patients, 5 (3 with M4eo and 2 with M5) experienced severe pulmonary complications attributed to tumor lysis, met the criteria for severe SIRS, and showed no clear evidence of infection. Four required pressor support for severe hypotension. Severe SIRS was significantly more common in myelomonocytic or monocytic AML (M4/M4eo/M5) than in other subtypes (P = 0.010) and significantly more common in M4eo than in M4/M5 (P = 0.008). Among 112 cases for which information was available, leukocyte reduction was significantly greater in patients with M4/M4eo/M5 than among others during the first 4 days of chemotherapy (P = 0.015). Leukocyte reduction was significantly more rapid among patients who had severe SIRS than among others (P = 0.008). CONCLUSIONS: Patients with M4/M4eo/M5 AML, especially M4eo, experience life-threatening cardiopulmonary complications of tumor lysis that meet the criteria for severe SIRS. This observation may reflect more rapid cell reduction and the unique biology of this subtype.

Asunto(s)

Muerte Celular , Leucemia Monocítica Aguda/complicaciones , Leucemia Monocítica Aguda/tratamiento farmacológico , Leucemia Mielomonocítica Aguda/complicaciones , Leucemia Mielomonocítica Aguda/tratamiento farmacológico , Síndrome de Respuesta Inflamatoria Sistémica/etiología , Adolescente , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Niño , Preescolar , Femenino , Humanos , Masculino , Estudios Retrospectivos , Factores de Riesgo

19.

Improving false discovery rate estimation.

Pounds, Stan; Cheng, Cheng.

Bioinformatics ; 20(11): 1737-45, 2004 Jul 22.

Artículo en Inglés | MEDLINE | ID: mdl-14988112

RESUMEN

MOTIVATION: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR). However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the q-value as an estimate of the proportion of false discoveries among a set of significant findings. However, such an interpretation of the q-value may be unwarranted considering that the q-value is based on an unstable estimator of the positive FDR (pFDR). Another method proposes estimating the FDR by modeling p-values as arising from a beta-uniform mixture (BUM) distribution. Unfortunately, the BUM approach is reliable only in settings where the assumed model accurately represents the actual distribution of p-values. METHODS: A method called the spacings LOESS histogram (SPLOSH) is proposed for estimating the conditional FDR (cFDR), the expected proportion of false positives conditioned on having k 'significant' findings. SPLOSH is designed to be more stable than the q-value and applicable in a wider variety of settings than BUM. RESULTS: In a simulation study and data analysis example, SPLOSH exhibits the desired characteristics relative to the q-value and BUM. AVAILABILITY: The Web site www.stjuderesearch.org/statistics/splosh.html has links to freely available S-plus code to implement the proposed procedure.

Asunto(s)

Algoritmos , Reacciones Falso Positivas , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Benchmarking/métodos , Simulación por Computador , Perfilación de la Expresión Génica/normas , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Control de Calidad , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

20.

Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values.

Pounds, Stan; Morris, Stephan W.

Bioinformatics ; 19(10): 1236-42, 2003 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-12835267

RESUMEN

MOTIVATION: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. RESULTS: The occurrence of false positives and false negatives in a microarray analysis could be easily estimated if the distribution of p-values were approximated and then expressed as a mixture of null and alternative densities. Essentially any distribution of p-values can be expressed as such a mixture by extracting a uniform density from it. AVAILABILITY: An S-plus function library is available from http://www.stjuderesearch.org/statistics.

Asunto(s)

Proteínas Adaptadoras Transductoras de Señales , Algoritmos , Perfilación de la Expresión Génica/métodos , Proteínas de Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia/métodos , Animales , Proteína 10 de la LLC-Linfoma de Células B , Linfocitos B/metabolismo , Reacciones Falso Negativas , Reacciones Falso Positivas , Ratones , Control de Calidad , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Alineación de Secuencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA