RESUMO
In this paper, we present a normalized statistical metric space for hidden Markov models (HMMs). HMMs are widely used to model real-world systems. Like graph matching, some previous approaches compare HMMs by evaluating the correspondence, or goodness of match, between every pair of states, concentrating on the structure of the models instead of the statistics of the process being observed. To remedy this, we present a new metric space that compares the statistics of HMMs within a given level of statistical significance. Compared with the Kullback-Leibler divergence, which is another widely used approach for measuring model similarity, our approach is a true metric, can always return an appropriate distance value, and provides a confidence measure on the metric value. Experimental results are given for a sample application, which quantify the similarity of HMMs of network traffic in the Tor anonymization system. This application is interesting since it considers models extracted from a system that is intentionally trying to obfuscate its internal workings. In the conclusion, we discuss applications in less-challenging domains, such as data mining.
Assuntos
Algoritmos , Interpretação Estatística de Dados , Mineração de Dados/métodos , Cadeias de Markov , Modelos Estatísticos , Simulação por ComputadorRESUMO
In this paper, we consider how we can detect patterns in data streams that are serial Markovian, where target behaviors are Markovian, but targets may switch from one Markovian behavior to another. We want to reliably and promptly detect behavior changes. Traditional Markov-model-based pattern detection approaches, such as hidden Markov models, use maximum likelihood techniques over the entire data stream to detect behaviors. To detect changes between behaviors, we use statistical pattern matching calculations performed on a sliding window of data samples. If the window size is very small, the system will suffer from excessive false-positive rates. If the window is very large, change-point detection is delayed. This paper finds both necessary and sufficient bounds on the window size. We present two methods of calculating window sizes based on the state and transition structures of the Markov models. Two application examples are presented to verify our results. Our first example problem uses simulations to illustrate the utility of the proposed approaches. The second example uses models extracted from a database of consumer purchases to illustrate their use in a real application.
Assuntos
Inteligência Artificial , Técnicas de Apoio para a Decisão , Cadeias de Markov , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Simulação por ComputadorRESUMO
Markov models are commonly used to analyze real-world problems. Their combination of discrete states and stochastic transitions is suited to applications with deterministic and stochastic components. Hidden Markov models (HMMs) are a class of Markov models commonly used in pattern recognition. Currently, HMMs recognize patterns using a maximum-likelihood approach. One major drawback with this approach is that data observations are mapped to HMMs without considering the number of data samples available. Another problem is that this approach is only useful for choosing between HMMs. It does not provide a criterion for determining whether or not a given HMM adequately matches the data stream. In this paper, we recognize complex behaviors using HMMs and confidence intervals. The certainty of a data match increases with the number of data samples considered. Receiver operating characteristic curves are used to find the optimal threshold for either accepting or rejecting an HMM description. We present one example using a family of HMMs to show the utility of the proposed approach. A second example using models extracted from a database of consumer purchases provides additional evidence that this approach can perform better than existing techniques.