Accurate and efficient estimation of small P-values with the cross-entropy method: applications in genomic data analysis.

Shi, Yang; Wang, Mengqiao; Shi, Weiping; Lee, Ji-Hyun; Kang, Huining; Jiang, Hui

Shi, Yang; Wang, Mengqiao; Shi, Weiping; Lee, Ji-Hyun; Kang, Huining; Jiang, Hui.

Afiliação

Shi Y; Division of Biostatistics and Data Science, Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, Georgia, USA.
Wang M; Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Chengdu, Sichuan, China.
Shi W; Biostatistics Shared Resource, University of New Mexico Comprehensive Cancer Center and Department of Internal Medicine, University of New Mexico, Albuquerque, New Mexico, USA.
Lee JH; Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Kang H; Department of Epidemiology and Biostatistics, West China School of Public Health, Sichuan University, Chengdu, Sichuan, China.
Jiang H; College of Mathematics, Jilin University, Changchun, Jilin, China.

Bioinformatics ; 35(14): 2441-2448, 2019 07 15.

Article em En | MEDLINE | ID: mdl-30521030

RESUMO

MOTIVATION: Small P-values are often required to be accurately estimated in large-scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small P-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently estimating small P-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. RESULTS: We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real-world examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small P-values (e.g. 10-6 to 10-100). The proposed algorithm is helpful for the improvement of some existing test procedures and the development of new test procedures in genomic studies. AVAILABILITY AND IMPLEMENTATION: R programs for implementing the algorithm and reproducing the results are available at: https://github.com/shilab2017/MCMC-CE-codes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Análise de Dados; Genômica; Algoritmos; Entropia; Genoma; Cadeias de Markov

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Genômica / Análise de Dados Tipo de estudo: Health_economic_evaluation Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google