Large-Deviation Properties of Sequence Alignment of Correlated Sequences.
J Comput Biol
; 2018 Sep 10.
Article
em En
| MEDLINE
| ID: mdl-30204481
The significance of alignment scores of optimally aligned DNA sequences can be estimated through the score distribution of pairs of random sequences. It is necessary to obtain statistics for the relevant high-scoring tail of the distribution. For local alignments of iid drawn sequences it has already been shown that the often assumed Gumbel distribution does not hold in the distribution tail, but has to be corrected by a Gaussian factor. Real DNA sequences were observed to show long-range correlations within sequences, which are not correctly modeled by iid random sequences. In this publication the large deviation method that was used in previous studies is applied to local and global alignment of such sequences with long-range correlations. We study the distributions over the full range of the support and obtained probabilities as low as [Formula: see text]. We show that again a correction to the Gumbel distribution is necessary to study the dependence of the parameters on the correlation strength. For global alignments the Gamma distribution, which was found heuristically to be a good fit in earlier simple sampling studies, is found to be a poor fit.
Texto completo:
1
Base de dados:
MEDLINE
Tipo de estudo:
Prognostic_studies
Idioma:
En
Ano de publicação:
2018
Tipo de documento:
Article