Evaluation of the evenness score in next-generation sequencing.

Oexle, Konrad

Oexle, Konrad.

Afiliação

Oexle K; Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland.

J Hum Genet ; 61(7): 627-32, 2016 Jul.

Article em En | MEDLINE | ID: mdl-27074764

RESUMO

The evenness score (E) in next-generation sequencing (NGS) quantifies the homogeneity in coverage of the NGS targets. Here I clarify the mathematical description of E, which is 1 minus the integral from 0 to 1 over the cumulative distribution function F(x) of the normalized coverage x, where normalization means division by the mean, and derive a computationally more efficient formula; that is, 1 minus the integral from 0 to 1 over the probability density distribution f(x) times 1-x. An analogous formula for empirical coverage data is provided as well as fast R command line scripts. This new formula allows for a general comparison of E with the coefficient of variation (=standard deviation σ of normalized data) which is the conventional measure of the relative width of a distribution. For symmetrical distributions, including the Gaussian, E can be predicted closely as 1-σ(2)/2â©¾Eâ©¾1-σ/2 with σâ©½1 owing to normalization and symmetry. In case of the log-normal distribution as a typical representative of positively skewed biological data, the analysis yields E≈exp(-σ*/2) with σ*(2)=ln(σ(2)+1) up to large σ (â©½3), and E≈1-F(exp(-1)) for very large σ (â©¾2.5). In the latter kind of rather uneven coverage, E can provide direct information on the fraction of well-covered targets that is not immediately delivered by the normalized σ. Otherwise, E does not appear to have major advantages over σ or over a simple score exp(-σ) based on it. Actually, exp(-σ) exploits a much larger part of its range for the evaluation of realistic NGS outputs.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/normas; Modelos Teóricos; Algoritmos; Sequenciamento de Nucleotídeos em Larga Escala/métodos; Humanos; Modelos Estatísticos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Sequenciamento de Nucleotídeos em Larga Escala / Modelos Teóricos Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google