Statistical significance approximation for local similarity analysis of dependent time series data.

Zhang, Fang; Sun, Fengzhu; Luan, Yihui

Zhang, Fang; Sun, Fengzhu; Luan, Yihui.

Afiliação

Zhang F; School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
Sun F; Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, 90089, CA, USA.
Luan Y; Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China.

BMC Bioinformatics ; 20(1): 53, 2019 Jan 28.

Article em En | MEDLINE | ID: mdl-30691412

ABSTRACT

ABSTRACT

BACKGROUND:

Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems.

RESULTS:

In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled.

CONCLUSIONS:

Our methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors.

Assuntos

Algoritmos; Modelos Estatísticos; Organismos Aquáticos/microbiologia; Bases de Dados como Assunto; Feminino; Humanos; Masculino; Microbiota; Fatores de Tempo

Palavras-chave

Data-driven local similarity analysis; Long-run variance; Nonparametric kernel estimate; Statistical significance

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Modelos Estatísticos Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Female / Humans / Male Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google