Multiscale DNA partitioning: statistical evidence for segments.

Futschik, Andreas; Hotz, Thomas; Munk, Axel; Sieling, Hannes

Futschik, Andreas; Hotz, Thomas; Munk, Axel; Sieling, Hannes.

Afiliação

Futschik A; Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University
Hotz T; Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University
Munk A; Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University
Sieling H; Department of Applied Statistics, JK University Linz, A-4040 Linz, Austria, Institute of Mathematics, Technische Universität Ilmenau, D-98693 Ilmenau, Germany, Institute for Mathematical Stochastics and Felix Bernstein Institute for Mathematical Statistics in Biosciences, Georgia Augusta University

Bioinformatics ; 30(16): 2255-62, 2014 Aug 15.

Article em En | MEDLINE | ID: mdl-24753487

ABSTRACT

ABSTRACT

MOTIVATION DNA segmentation, i.e. the partitioning of DNA in compositionally homogeneous segments, is a basic task in bioinformatics. Different algorithms have been proposed for various partitioning criteria such as Guanine/Cytosine (GC) content, local ancestry in population genetics or copy number variation. A critical component of any such method is the choice of an appropriate number of segments. Some methods use model selection criteria and do not provide a suitable error control. Other methods that are based on simulating a statistic under a null model provide suitable error control only if the correct null model is chosen.

RESULTS:

Here, we focus on partitioning with respect to GC content and propose a new approach that provides statistical error control as in statistical hypothesis testing, it guarantees with a user-specified probability [Formula see text] that the number of identified segments does not exceed the number of actually present segments. The method is based on a statistical multiscale criterion, rendering this as a segmentation method that searches segments of any length (on all scales) simultaneously. It is also accurate in localizing segments under benchmark scenarios, our approach leads to a segmentation that is more accurate than the approaches discussed in the comparative review of Elhaik et al. In our real data examples, we find segments that often correspond well to features taken from standard University of California at Santa Cruz (UCSC) genome annotation tracks. AVAILABILITY AND IMPLEMENTATION Our method is implemented in function smuceR of the R-package stepR available at http//www.stochastik.math.uni-goettingen.de/smuce.

Assuntos

Algoritmos; DNA/química; Análise de Sequência de DNA/métodos; Bacteriófago lambda/genética; Composição de Bases; Interpretação Estatística de Dados; Genoma Humano; Humanos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / DNA / Análise de Sequência de DNA Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2014 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google