Your browser doesn't support javascript.
loading
A new algorithm to train hidden Markov models for biological sequences with partial labels.
Li, Jiefu; Lee, Jung-Youn; Liao, Li.
Afiliação
  • Li J; Computer and Information Sciences, University of Delaware, 101 Smith Hall, Newark, DE, 19716, USA.
  • Lee JY; Plant and Soil Sciences, University of Delaware, 15 Innovation Way, Newark, 19716, USA.
  • Liao L; Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, 19716, USA.
BMC Bioinformatics ; 22(1): 162, 2021 Mar 26.
Article em En | MEDLINE | ID: mdl-33771095
ABSTRACT

BACKGROUND:

Hidden Markov models (HMM) are a powerful tool for analyzing biological sequences in a wide variety of applications, from profiling functional protein families to identifying functional domains. The standard method used for HMM training is either by maximum likelihood using counting when sequences are labelled or by expectation maximization, such as the Baum-Welch algorithm, when sequences are unlabelled. However, increasingly there are situations where sequences are just partially labelled. In this paper, we designed a new training method based on the Baum-Welch algorithm to train HMMs for situations in which only partial labeling is available for certain biological problems.

RESULTS:

Compared with a similar method previously reported that is designed for the purpose of active learning in text mining, our method achieves significant improvements in model training, as demonstrated by higher accuracy when the trained models are tested for decoding with both synthetic data and real data.

CONCLUSIONS:

A novel training method is developed to improve the training of hidden Markov models by utilizing partial labelled data. The method will impact on detecting de novo motifs and signals in biological sequence data. In particular, the method will be deployed in active learning mode to the ongoing research in detecting plasmodesmata targeting signals and assess the performance with validations from wet-lab experiments.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Proteínas Tipo de estudo: Health_economic_evaluation / Prognostic_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Proteínas Tipo de estudo: Health_economic_evaluation / Prognostic_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article