Your browser doesn't support javascript.
loading
Exact Integral Formulas for False Discovery Rate and the Variance of False Discovery Proportion.
Sadygov, Rovshan G; Zhu, Justin X; Deberneh, Henock M.
Afiliação
  • Sadygov RG; Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Blvd, Galveston, Texas 77555, United States.
  • Zhu JX; Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Blvd, Galveston, Texas 77555, United States.
  • Deberneh HM; Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Blvd, Galveston, Texas 77555, United States.
J Proteome Res ; 23(6): 2298-2305, 2024 Jun 07.
Article em En | MEDLINE | ID: mdl-38809146
ABSTRACT
Multiple hypothesis testing is an integral component of data analysis for large-scale technologies such as proteomics, transcriptomics, or metabolomics, for which the false discovery rate (FDR) and positive FDR (pFDR) have been accepted as error estimation and control measures. The pFDR is the expectation of false discovery proportion (FDP), which refers to the ratio of the number of null hypotheses to that of all rejected hypotheses. In practice, the expectation of ratio is approximated by the ratio of expectation; however, the conditions for transforming the former into the latter have not been investigated. This work derives exact integral expressions for the expectation (pFDR) and variance of FDP. The widely used approximation (ratio of expectations) is shown to be a particular case (in the limit of a large sample size) of the integral formula for pFDR. A recurrence formula is provided to compute the pFDR for a predefined number of null hypotheses. The variance of FDP was approximated for a practical application in peptide identification using forward and reversed protein sequences. The simulations demonstrate that the integral expression exhibits better accuracy than the approximate formula in the case of a small number of hypotheses. For large sample sizes, the pFDRs obtained by the integral expression and approximation do not differ substantially. Applications to proteomics data sets are included.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteômica Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteômica Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article