Evaluation of the evenness score in next-generation sequencing.
J Hum Genet
; 61(7): 627-32, 2016 Jul.
Article
em En
| MEDLINE
| ID: mdl-27074764
The evenness score (E) in next-generation sequencing (NGS) quantifies the homogeneity in coverage of the NGS targets. Here I clarify the mathematical description of E, which is 1 minus the integral from 0 to 1 over the cumulative distribution function F(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1-x. An analogous formula for empirical coverage data is provided as well as fast R command line scripts. This new formula allows for a general comparison of E with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. For symmetrical distributions, including the Gaussian, E can be predicted closely as 1-σ(2)/2⩾E⩾1-σ/2 with σ⩽1 owing to normalization and symmetry. In case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields E≈exp(-σ*/2) with σ*(2)=ln(σ(2)+1) up to large σ (⩽3), and E≈1-F(exp(-1)) for very large σ (⩾2.5). In the latter kind of rather uneven coverage, E can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. Otherwise, E does not appear to have major advantages over σ or over a simple score exp(-σ) based on it. Actually, exp(-σ) exploits a much larger part of its range for the evaluation of realistic NGS outputs.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Sequenciamento de Nucleotídeos em Larga Escala
/
Modelos Teóricos
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Limite:
Humans
Idioma:
En
Ano de publicação:
2016
Tipo de documento:
Article