Pesquisa | Portal Regional da BVS

1.

Clustering methods: To optimize or to not optimize?

Brusco, Michael; Steinley, Douglas; Watts, Ashley L.

Psychol Methods ; 2024 Sep 12.

Artigo em Inglês | MEDLINE | ID: mdl-39264649

RESUMO

Many clustering problems are associated with a particular objective criterion that is sought to be optimized. There are often several methods that can be used to tackle the optimization problem, and one or more of them might guarantee a globally optimal solution. However, it is quite possible that, relative to one or more suboptimal solutions, a globally optimal solution might be less interpretable from the standpoint of psychological theory or be less in accordance with some known (i.e., true) cluster structure. For example, in simulation experiments, it has sometimes been observed that there is not a perfect correspondence between the optimized clustering criterion and recovery of the underlying known cluster structure. This can lead to the misconception that clustering methods with a tendency to produce suboptimal solutions might, in some instances, be preferable to superior methods that provide globally optimal (or at least better locally optimal) solutions. In this article, we present results from simulation studies in the context of K-median clustering where departure from global optimality was carefully controlled. Although the results showed that suboptimal solutions sometimes produced marginally better recovery for experimental cells where the known cluster structure was less well-defined, capriciously accepting inferior solutions is an unwise practice. However, there are instances in which some sacrifice in the optimization criterion value to meet certain desirable constraints or to improve the value of one or more other relevant criteria is principled. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

2.

Improving the Walktrap Algorithm Using K-Means Clustering.

Brusco, Michael; Steinley, Douglas; Watts, Ashley L.

Multivariate Behav Res ; 59(2): 266-288, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38361218

RESUMO

The walktrap algorithm is one of the most popular community-detection methods in psychological research. Several simulation studies have shown that it is often effective at determining the correct number of communities and assigning items to their proper community. Nevertheless, it is important to recognize that the walktrap algorithm relies on hierarchical clustering because it was originally developed for networks much larger than those encountered in psychological research. In this paper, we present and demonstrate a computational alternative to the hierarchical algorithm that is conceptually easier to understand. More importantly, we show that better solutions to the sum-of-squares optimization problem that is heuristically tackled by hierarchical clustering in the walktrap algorithm can often be obtained using exact or approximate methods for K-means clustering. Three simulation studies and analyses of empirical networks were completed to assess the impact of better sum-of-squares solutions.

Assuntos

Algoritmos , Simulação por Computador , Análise por Conglomerados

3.

A comparison of logistic regression methods for Ising model estimation.

Brusco, Michael J; Steinley, Douglas; Watts, Ashley L.

Behav Res Methods ; 55(7): 3566-3584, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-36266525

RESUMO

The Ising model has received significant attention in network psychometrics during the past decade. A popular estimation procedure is IsingFit, which uses nodewise l1-regularized logistic regression along with the extended Bayesian information criterion to establish the edge weights for the network. In this paper, we report the results of a simulation study comparing IsingFit to two alternative approaches: (1) a nonregularized nodewise stepwise logistic regression method, and (2) a recently proposed global l1-regularized logistic regression method that estimates all edge weights in a single stage, thus circumventing the need for nodewise estimation. MATLAB scripts for the methods are provided as supplemental material. The global l1-regularized logistic regression method generally provided greater accuracy and sensitivity than IsingFit, at the expense of lower specificity and much greater computation time. The stepwise approach showed considerable promise. Relative to the l1-regularized approaches, the stepwise method provided better average specificity for all experimental conditions, as well as comparable accuracy and sensitivity at the largest sample size.

Assuntos

Modelos Logísticos , Humanos , Teorema de Bayes , Simulação por Computador

4.

On maximization of the modularity index in network psychometrics.

Brusco, Michael J; Steinley, Douglas; Watts, Ashley L.

Behav Res Methods ; 55(7): 3549-3565, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-36258108

RESUMO

The modularity index (Q) is an important criterion for many community detection heuristics used in network psychometrics and its subareas (e.g., exploratory graph analysis). Some heuristics seek to directly maximize Q, whereas others, such as the walktrap algorithm, only use the modularity index post hoc to determine the number of communities. Researchers in network psychometrics have typically not employed methods that are guaranteed to find a partition that maximizes Q, perhaps because of the complexity of the underlying mathematical programming problem. In this paper, for networks of the size commonly encountered in network psychometrics, we explore the utility of finding the partition that maximizes Q via formulation and solution of a clique partitioning problem (CPP). A key benefit of the CPP is that the number of communities is naturally determined by its solution and, therefore, need not be prespecified in advance. The results of two simulation studies comparing maximization of Q to two other methods that seek to maximize modularity (fast greedy and Louvain), as well as one popular method that does not (walktrap algorithm), provide interesting insights as to the relative performances of the methods with respect to identification of the correct number of communities and the recovery of underlying community structure.

Assuntos

Algoritmos , Humanos , Psicometria , Simulação por Computador

5.

A modified approach to fitting relative importance networks.

Brusco, Michael; Watts, Ashley L; Steinley, Douglas.

Psychol Methods ; 2022 Jul 04.

Artigo em Inglês | MEDLINE | ID: mdl-35786981

RESUMO

Most researchers have estimated the edge weights for relative importance networks using a well-established measure of general dominance for multiple regression. This approach has several desirable properties including edge weights that represent R² contributions, in-degree centralities that correspond to R² for each item when using other items as predictors, and strong replicability. We endorse the continued use of relative importance networks and believe they have a valuable role in network psychometrics. However, to improve their utility, we introduce a modified approach that uses best-subsets regression as a preceding step to select an appropriate subset of predictors for each item. The benefits of this modification include: (a) computation time savings that can enable larger relative importance networks to be estimated, (b) a principled approach to edge selection that can significantly improve specificity, (c) the provision of a signed network if desired, (d) the potential use of the best-subsets regression approach for estimating Gaussian graphical models, and (e) possible generalization to best-subsets logistic regression for Ising models. We describe, evaluate, and demonstrate the proposed approach and discuss its strengths and limitations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

6.

A comparison of spectral clustering and the walktrap algorithm for community detection in network psychometrics.

Brusco, Michael; Steinley, Douglas; Watts, Ashley L.

Psychol Methods ; 2022 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-35797161

RESUMO

Spectral clustering is a well-known method for clustering the vertices of an undirected network. Although its use in network psychometrics has been limited, spectral clustering has a close relationship to the commonly used walktrap algorithm. In this article, we report results from simulation experiments designed to evaluate the ability of spectral clustering and the walktrap algorithm to recover underlying cluster (or community) structure in networks. The salient findings include: (a) the recovery performance of the walktrap algorithm can be improved by using K-means clustering instead of hierarchical clustering; (b) K-means and K-median clustering led to comparable recovery performance when used to cluster vertices based on the eigenvectors of Laplacian matrices in spectral clustering; (c) spectral clustering using the unnormalized Laplacian matrix generally yielded inferior cluster recovery in comparison to the other methods; (d) when the correct number of clusters was provided for the methods, spectral clustering using the normalized Laplacian matrix led to better recovery than the walktrap algorithm; and (e) when the correct number of clusters was not provided, the walktrap algorithm using the Qw modularity index was better than spectral clustering using the eigengap heuristic at determining the appropriate number of clusters. Overall, both the walktrap algorithm and spectral clustering of the normalized Laplacian matrix are effective for partitioning the vertices of undirected networks, with the latter performing better in most instances. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

7.

Disentangling relationships in symptom networks using matrix permutation methods.

Brusco, Michael J; Steinley, Douglas; Watts, Ashley L.

Psychometrika ; 87(1): 133-155, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34282531

RESUMO

Common outputs of software programs for network estimation include association matrices containing the edge weights between pairs of symptoms and a plot of the symptom network. Although such outputs are useful, it is sometimes difficult to ascertain structural relationships among symptoms from these types of output alone. We propose that matrix permutation provides a simple, yet effective, approach for clarifying the order relationships among the symptoms based on the edge weights of the network. For directed symptom networks, we use a permutation criterion that has classic applications in electrical circuit theory and economics. This criterion can be used to place symptoms that strongly predict other symptoms at the beginning of the ordering, and symptoms that are strongly predicted by other symptoms at the end. For undirected symptom networks, we recommend a permutation criterion that is based on location theory in the field of operations research. When using this criterion, symptoms with many strong ties tend to be placed centrally in the ordering, whereas weakly-tied symptoms are placed at the ends. The permutation optimization problems are solved using dynamic programming. We also make use of branch-search algorithms for extracting maximum cardinality subsets of symptoms that have perfect structure with respect to a selected criterion. Software for implementing the dynamic programming algorithms is available in MATLAB and R. Two networks from the literature are used to demonstrate the matrix permutation algorithms.

Assuntos

Algoritmos , Software , Psicometria

8.

On Fixed Marginal Distributions and Psychometric Network Models.

Steinley, Douglas; Brusco, Michael J.

Multivariate Behav Res ; 56(2): 329-335, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33960861

RESUMO

This reply addresses the commentary by Epskamp et al. (in press) on our prior work, of using fixed marginals for sampling the data for testing hypothesis in psychometric network application. Mathematical results are presented for expected column (e.g., item prevalence) and row (e.g., subject severity) probabilities under three classical sampling schemes in categorical data analysis: (i) fixing the density, (ii) fixing either the row or column marginal, or (iii) fixing both the row and column marginal. It is argued that, while a unidimensional structure may not be the model we want, it is the structure we are confronted with given the binary nature of the data. Interpreting network models in the context of this artifactual structure is necessary, with preferred solutions to be expanding the item sets of disorders and moving away from the use of binary data and their associated constraints.

Assuntos

Psicometria , Probabilidade

9.

A comparison of 71 binary similarity coefficients: The effect of base rates.

Brusco, Michael; Cradit, J Dennis; Steinley, Douglas.

PLoS One ; 16(4): e0247751, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33826612

RESUMO

There are many psychological applications that require collapsing the information in a two-mode (e.g., respondents-by-attributes) binary matrix into a one-mode (e.g., attributes-by-attributes) similarity matrix. This process requires the selection of a measure of similarity between binary attributes. A vast number of binary similarity coefficients have been proposed in fields such as biology, geology, and ecology. Although previous studies have reported cluster analyses of binary similarity coefficients, there has been little exploration of how cluster memberships are affected by the base rates (percentage of ones) for the binary attributes. We conducted a simulation experiment that compared two-cluster K-median partitions of 71 binary similarity coefficients based on their pairwise correlations obtained under 15 different base-rate configurations. The results reveal that some subsets of coefficients consistently group together regardless of the base rates. However, there are other subsets of coefficients that group together for some base rates, but not for others.

Assuntos

Algoritmos , Simulação por Computador , Modelos Teóricos

10.

Combinatorial Optimization of Clustering Decisions: An Approach to Refine Psychiatric Diagnoses.

Loeffelman, Jordan E; Steinley, Douglas; Boness, Cassandra L; Trull, Timothy J; Wood, Phillip K; Brusco, Michael J; Sher, Kenneth J.

Multivariate Behav Res ; 56(1): 57-69, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-32054331

RESUMO

Using complete enumeration (e.g., generating all possible subsets of item combinations) to evaluate clustering problems has the benefit of locating globally optimal solutions automatically without the concern of sampling variability. The proposed method is meant to combine clustering variables in such a way as to create groups that are maximally different on a theoretically sound derivation variable(s). After the population of all unique sets is permuted, optimization on some predefined, user-specific function can occur. We apply this technique to optimizing the diagnosis of Alcohol Use Disorder. This is a unique application, from a clustering point of view, in that the decision rule for clustering observations into the "diagnosis" group relies on both the set of items being considered and a predefined threshold on the number of items required to be endorsed for the "diagnosis" to occur. In optimizing diagnostic rules, criteria set sizes can be reduced without a loss of significant information when compared to current and proposed, alternative, diagnostic schemes.

Assuntos

Alcoolismo , Análise por Conglomerados , Transtornos Mentais , Alcoolismo/diagnóstico , Transtornos Mentais/diagnóstico

11.

Deterministic blockmodelling of signed and two-mode networks: A tutorial with software and psychological examples.

Brusco, Michael; Doreian, Patrick; Steinley, Douglas.

Br J Math Stat Psychol ; 74(1): 34-63, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-31705539

RESUMO

Deterministic blockmodelling is a well-established clustering method for both exploratory and confirmatory social network analysis seeking partitions of a set of actors so that actors within each cluster are similar with respect to their patterns of ties to other actors (or, in some cases, other objects when considering two-mode networks). Even though some of the historical foundations for certain types of blockmodelling stem from the psychological literature, applications of deterministic blockmodelling in psychological research are relatively rare. This scarcity is potentially attributable to three factors: a general unfamiliarity with relevant blockmodelling methods and applications; a lack of awareness of the value of partitioning network data for understanding group structures and processes; and the unavailability of such methods on software platforms familiar to most psychological researchers. To tackle the first two items, we provide a tutorial presenting a general framework for blockmodelling and describe two of the most important types of deterministic blockmodelling applications relevant to psychological research: structural balance partitioning and two-mode partitioning based on structural equivalence. To address the third problem, we developed a suite of software programs that are available as both Fortran executable files and compiled Fortran dynamic-link libraries that can be implemented in the R software system. We demonstrate these software programs using networks from the literature.

Assuntos

Software , Análise por Conglomerados

12.

Combining diversity and dispersion criteria for anticlustering: A bicriterion approach.

Brusco, Michael J; Cradit, J Dennis; Steinley, Douglas.

Br J Math Stat Psychol ; 73(3): 375-396, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-31512759

RESUMO

Most partitioning methods used in psychological research seek to produce homogeneous groups (i.e., groups with low intra-group dissimilarity). However, there are also applications where the goal is to provide heterogeneous groups (i.e., groups with high intra-group dissimilarity). Examples of these anticlustering contexts include construction of stimulus sets, formation of student groups, assignment of employees to project work teams, and assembly of test forms from a bank of items. Unfortunately, most commercial software packages are not equipped to accommodate the objective criteria and constraints that commonly arise for anticlustering problems. Two important objective criteria for anticlustering based on information in a dissimilarity matrix are: a diversity measure based on within-cluster sums of dissimilarities; and a dispersion measure based on the within-cluster minimum dissimilarities. In many instances, it is possible to find a partition that provides a large improvement in one of these two criteria with little (or no) sacrifice in the other criterion. For this reason, it is of significant value to explore the trade-offs that arise between these two criteria. Accordingly, the key contribution of this paper is the formulation of a bicriterion optimization problem for anticlustering based on the diversity and dispersion criteria, along with heuristics to approximate the Pareto efficient set of partitions. A motivating example and computational study are provided within the framework of test assembly.

Assuntos

Análise por Conglomerados , Modelos Estatísticos , Psicologia/estatística & dados numéricos , Algoritmos , Heurística Computacional , Simulação por Computador , Avaliação Educacional/estatística & dados numéricos , Humanos , Testes Neuropsicológicos/estatística & dados numéricos , Psicometria/estatística & dados numéricos

13.

On Ising models and algorithms for the construction of symptom networks in psychopathological research.

Brusco, Michael J; Steinley, Douglas; Hoffman, Michaela; Davis-Stober, Clintin; Wasserman, Stanley.

Psychol Methods ; 24(6): 735-753, 2019 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-31589062

RESUMO

During the past 5 to 10 years, an estimation method known as eLasso has been used extensively to produce symptom networks (or, more precisely, symptom dependence graphs) from binary data in psychopathological research. The eLasso method is based on a particular type of Ising model that corresponds to binary pairwise Markov random fields, and its popularity is due, in part, to an efficient estimation process that is based on a series of l1-regularized logistic regressions. In this article, we offer an unprecedented critique of the Ising model and eLasso. We provide a careful assessment of the conditions that underlie the Ising model as well as specific limitations associated with the eLasso estimation algorithm. This assessment leads to serious concerns regarding the implementation of eLasso in psychopathological research. Some potential strategies for eliminating or, at least, mitigating these concerns include (a) the use of partitioning or mixture modeling to account for unobserved heterogeneity in the sample of respondents, and (b) the use of co-occurrence measures for symptom similarity to either replace or supplement the covariance/correlation measure associated with eLasso. Two psychopathological data sets are used to highlight the concerns that are raised in the critique. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Assuntos

Algoritmos , Sintomas Comportamentais , Pesquisa Biomédica , Modelos Estatísticos , Psicopatologia , Pesquisa Biomédica/normas , Humanos , Psicopatologia/normas

14.

Affinity propagation: An exemplar-based tool for clustering in psychological research.

Brusco, Michael J; Steinley, Douglas; Stevens, Jordan; Cradit, J Dennis.

Br J Math Stat Psychol ; 72(1): 155-182, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-29633235

RESUMO

Affinity propagation is a message-passing-based clustering procedure that has received widespread attention in domains such as biological science, physics, and computer science. However, its implementation in psychology and related areas of social science is comparatively scant. In this paper, we describe the basic principles of affinity propagation, its relationship to other clustering problems, and the types of data for which it can be used for cluster analysis. More importantly, we identify the strengths and weaknesses of affinity propagation as a clustering tool in general and highlight potential opportunities for its use in psychological research. Numerical examples are provided to illustrate the method.

Assuntos

Algoritmos , Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos , Psicologia/métodos , Simulação por Computador , Humanos , Pesquisa , Projetos de Pesquisa

15.

Measuring and testing the agreement of matrices.

Brusco, Michael J; Steinley, Douglas.

Behav Res Methods ; 50(6): 2256-2266, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-29218590

RESUMO

The problem of comparing the agreement of two n × n matrices has a variety of applications in experimental psychology. A well-known index of agreement is based on the sum of the element-wise products of the matrices. Although less familiar to many researchers, measures of agreement based on within-row and/or within-column gradients can also be useful. We provide a suite of MATLAB programs for computing agreement indices and performing matrix permutation tests of those indices. Programs for computing exact p-values are available for small matrices, whereas resampling programs for approximate p-values are provided for larger matrices.

Assuntos

Pesquisa Comportamental/estatística & dados numéricos , Interpretação Estatística de Dados , Modelos Estatísticos , Software , Humanos

16.

A note on the expected value of the Rand index.

Steinley, Douglas; Brusco, Michael J.

Br J Math Stat Psychol ; 71(2): 287-299, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29159803

RESUMO

Two expectations of the adjusted Rand index (ARI) are compared. It is shown that the expectation derived by Morey and Agresti (1984, Educational and Psychological Measurement, 44, 33) under the multinomial distribution to approximate the exact expectation from the hypergeometric distribution (Hubert & Arabie, 1985, Journal of Classification, 2, 193) provides a poor approximation, and, in some cases, the difference between the two expectations can increase with the sample size. Proofs concerning the minimum and maximum difference between the two expectations are provided, and it is shown through simulation that the ARI can differ significantly depending on which expectation is used. Furthermore, when compared in a hypothesis testing framework, multinomial approximation overly favours the null hypothesis.

Assuntos

Análise por Conglomerados , Simulação por Computador , Modelos Psicológicos , Psicometria/métodos , Algoritmos , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Tamanho da Amostra , Software

17.

Detecting Clusters/Communities in Social Networks.

Hoffman, Michaela; Steinley, Douglas; Gates, Kathleen M; Prinstein, Mitchell J; Brusco, Michael J.

Multivariate Behav Res ; 53(1): 57-73, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29220584

RESUMO

Cohen's κ, a similarity measure for categorical data, has since been applied to problems in the data mining field such as cluster analysis and network link prediction. In this paper, a new application is examined: community detection in networks. A new algorithm is proposed that uses Cohen's κ as a similarity measure for each pair of nodes; subsequently, the κ values are then clustered to detect the communities. This paper defines and tests this method on a variety of simulated and real networks. The results are compared with those from eight other community detection algorithms. Results show this new algorithm is consistently among the top performers in classifying data points both on simulated and real networks. Additionally, this is one of the broadest comparative simulations for comparing community detection algorithms to date.

Assuntos

Algoritmos , Redes de Comunicação de Computadores , Apoio Social , Análise por Conglomerados , Humanos

18.

A method for making inferences in network analysis: Comment on Forbes, Wright, Markon, and Krueger (2017).

Steinley, Douglas; Hoffman, Michaela; Brusco, Michael J; Sher, Kenneth J.

J Abnorm Psychol ; 126(7): 1000-1010, 2017 10.

Artigo em Inglês | MEDLINE | ID: mdl-29106283

RESUMO

Forbes, Wright, Markon, and Krueger (2017) make a compelling case for proceeding cautiously with respect to the overinterpretation and dissemination of results using the increasingly popular approach of creating "networks" from co-occurrences of psychopathology symptoms. We commend the authors on their initial investigation and their utilization of cross-validation techniques in an effort to capture the stability of a variety of network estimation methods. Such techniques get at the heart of establishing "reproducibility," an increasing focus of concern in both psychology (e.g., Pashler & Wagenmakers, 2012) and science more generally (e.g., Baker, 2016). However, as we will show, the problem is likely worse (or at least more complicated) than they initially indicated. Specifically, for multivariate binary data, the marginal distributions enforce a large degree of structure on the data. We show that some expected measurements-such as commonly used centrality statistics-can have substantially higher values than what would usually be expected. As such, we propose a nonparametric approach to generate confidence intervals through Monte Carlo simulation. We apply the proposed methodology to the National Comorbidity Survey - Replication, provided by Forbes et al., finding that the many of the results are indistinguishable from what would be expected by chance. Further, we discuss the problem of multiple testing and potential issues of applying methods developed for 1-mode networks (e.g., ties within a single set of observations) to 2-mode networks (e.g., ties between 2 distinct sets of entities). When taken together, these issues indicate that the psychometric network models should be employed with extreme caution and interpreted guardedly. (PsycINFO Database Record

Assuntos

Psicopatologia , Projetos de Pesquisa , Comorbidade , Humanos , Reprodutibilidade dos Testes

19.

A simulated annealing heuristic for maximum correlation core/periphery partitioning of binary networks.

Brusco, Michael; Stolze, Hannah J; Hoffman, Michaela; Steinley, Douglas.

PLoS One ; 12(5): e0170448, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28486475

RESUMO

A popular objective criterion for partitioning a set of actors into core and periphery subsets is the maximization of the correlation between an ideal and observed structure associated with intra-core and intra-periphery ties. The resulting optimization problem has commonly been tackled using heuristic procedures such as relocation algorithms, genetic algorithms, and simulated annealing. In this paper, we present a computationally efficient simulated annealing algorithm for maximum correlation core/periphery partitioning of binary networks. The algorithm is evaluated using simulated networks consisting of up to 2000 actors and spanning a variety of densities for the intra-core, intra-periphery, and inter-core-periphery components of the network. Core/periphery analyses of problem solving, trust, and information sharing networks for the frontline employees and managers of a consumer packaged goods manufacturer are provided to illustrate the use of the model.

Assuntos

Algoritmos , Simulação por Computador , Heurística

20.

Gaussian model-based partitioning using iterated local search.

Brusco, Michael J; Shireman, Emilie; Steinley, Douglas; Brudvig, Susan; Cradit, J Dennis.

Br J Math Stat Psychol ; 70(1): 1-24, 2017 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-28130935

RESUMO

The emergence of Gaussian model-based partitioning as a viable alternative to K-means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model-based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple-restart (multistart) approach commonly used for K-means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart-ILS procedures for minimizing a very general model-based criterion that assumes no restrictions on cluster size or within-group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10-min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained.

Assuntos

Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Modelos Estatísticos , Distribuição Normal , Simulação por Computador

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA