Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
Add more filters











Publication year range
1.
Behav Res Methods ; 55(7): 3566-3584, 2023 10.
Article in English | MEDLINE | ID: mdl-36266525

ABSTRACT

The Ising model has received significant attention in network psychometrics during the past decade. A popular estimation procedure is IsingFit, which uses nodewise l1-regularized logistic regression along with the extended Bayesian information criterion to establish the edge weights for the network. In this paper, we report the results of a simulation study comparing IsingFit to two alternative approaches: (1) a nonregularized nodewise stepwise logistic regression method, and (2) a recently proposed global l1-regularized logistic regression method that estimates all edge weights in a single stage, thus circumventing the need for nodewise estimation. MATLAB scripts for the methods are provided as supplemental material. The global l1-regularized logistic regression method generally provided greater accuracy and sensitivity than IsingFit, at the expense of lower specificity and much greater computation time. The stepwise approach showed considerable promise. Relative to the l1-regularized approaches, the stepwise method provided better average specificity for all experimental conditions, as well as comparable accuracy and sensitivity at the largest sample size.


Subject(s)
Logistic Models , Humans , Bayes Theorem , Computer Simulation
2.
Behav Res Methods ; 55(7): 3549-3565, 2023 10.
Article in English | MEDLINE | ID: mdl-36258108

ABSTRACT

The modularity index (Q) is an important criterion for many community detection heuristics used in network psychometrics and its subareas (e.g., exploratory graph analysis). Some heuristics seek to directly maximize Q, whereas others, such as the walktrap algorithm, only use the modularity index post hoc to determine the number of communities. Researchers in network psychometrics have typically not employed methods that are guaranteed to find a partition that maximizes Q, perhaps because of the complexity of the underlying mathematical programming problem. In this paper, for networks of the size commonly encountered in network psychometrics, we explore the utility of finding the partition that maximizes Q via formulation and solution of a clique partitioning problem (CPP). A key benefit of the CPP is that the number of communities is naturally determined by its solution and, therefore, need not be prespecified in advance. The results of two simulation studies comparing maximization of Q to two other methods that seek to maximize modularity (fast greedy and Louvain), as well as one popular method that does not (walktrap algorithm), provide interesting insights as to the relative performances of the methods with respect to identification of the correct number of communities and the recovery of underlying community structure.


Subject(s)
Algorithms , Humans , Psychometrics , Computer Simulation
3.
Psychometrika ; 87(1): 133-155, 2022 03.
Article in English | MEDLINE | ID: mdl-34282531

ABSTRACT

Common outputs of software programs for network estimation include association matrices containing the edge weights between pairs of symptoms and a plot of the symptom network. Although such outputs are useful, it is sometimes difficult to ascertain structural relationships among symptoms from these types of output alone. We propose that matrix permutation provides a simple, yet effective, approach for clarifying the order relationships among the symptoms based on the edge weights of the network. For directed symptom networks, we use a permutation criterion that has classic applications in electrical circuit theory and economics. This criterion can be used to place symptoms that strongly predict other symptoms at the beginning of the ordering, and symptoms that are strongly predicted by other symptoms at the end. For undirected symptom networks, we recommend a permutation criterion that is based on location theory in the field of operations research. When using this criterion, symptoms with many strong ties tend to be placed centrally in the ordering, whereas weakly-tied symptoms are placed at the ends. The permutation optimization problems are solved using dynamic programming. We also make use of branch-search algorithms for extracting maximum cardinality subsets of symptoms that have perfect structure with respect to a selected criterion. Software for implementing the dynamic programming algorithms is available in MATLAB and R. Two networks from the literature are used to demonstrate the matrix permutation algorithms.


Subject(s)
Algorithms , Software , Psychometrics
4.
Multivariate Behav Res ; 56(2): 329-335, 2021.
Article in English | MEDLINE | ID: mdl-33960861

ABSTRACT

This reply addresses the commentary by Epskamp et al. (in press) on our prior work, of using fixed marginals for sampling the data for testing hypothesis in psychometric network application. Mathematical results are presented for expected column (e.g., item prevalence) and row (e.g., subject severity) probabilities under three classical sampling schemes in categorical data analysis: (i) fixing the density, (ii) fixing either the row or column marginal, or (iii) fixing both the row and column marginal. It is argued that, while a unidimensional structure may not be the model we want, it is the structure we are confronted with given the binary nature of the data. Interpreting network models in the context of this artifactual structure is necessary, with preferred solutions to be expanding the item sets of disorders and moving away from the use of binary data and their associated constraints.


Subject(s)
Psychometrics , Probability
5.
Multivariate Behav Res ; 56(1): 57-69, 2021.
Article in English | MEDLINE | ID: mdl-32054331

ABSTRACT

Using complete enumeration (e.g., generating all possible subsets of item combinations) to evaluate clustering problems has the benefit of locating globally optimal solutions automatically without the concern of sampling variability. The proposed method is meant to combine clustering variables in such a way as to create groups that are maximally different on a theoretically sound derivation variable(s). After the population of all unique sets is permuted, optimization on some predefined, user-specific function can occur. We apply this technique to optimizing the diagnosis of Alcohol Use Disorder. This is a unique application, from a clustering point of view, in that the decision rule for clustering observations into the "diagnosis" group relies on both the set of items being considered and a predefined threshold on the number of items required to be endorsed for the "diagnosis" to occur. In optimizing diagnostic rules, criteria set sizes can be reduced without a loss of significant information when compared to current and proposed, alternative, diagnostic schemes.


Subject(s)
Alcoholism , Cluster Analysis , Mental Disorders , Alcoholism/diagnosis , Mental Disorders/diagnosis
6.
Br J Math Stat Psychol ; 73(3): 375-396, 2020 11.
Article in English | MEDLINE | ID: mdl-31512759

ABSTRACT

Most partitioning methods used in psychological research seek to produce homogeneous groups (i.e., groups with low intra-group dissimilarity). However, there are also applications where the goal is to provide heterogeneous groups (i.e., groups with high intra-group dissimilarity). Examples of these anticlustering contexts include construction of stimulus sets, formation of student groups, assignment of employees to project work teams, and assembly of test forms from a bank of items. Unfortunately, most commercial software packages are not equipped to accommodate the objective criteria and constraints that commonly arise for anticlustering problems. Two important objective criteria for anticlustering based on information in a dissimilarity matrix are: a diversity measure based on within-cluster sums of dissimilarities; and a dispersion measure based on the within-cluster minimum dissimilarities. In many instances, it is possible to find a partition that provides a large improvement in one of these two criteria with little (or no) sacrifice in the other criterion. For this reason, it is of significant value to explore the trade-offs that arise between these two criteria. Accordingly, the key contribution of this paper is the formulation of a bicriterion optimization problem for anticlustering based on the diversity and dispersion criteria, along with heuristics to approximate the Pareto efficient set of partitions. A motivating example and computational study are provided within the framework of test assembly.


Subject(s)
Cluster Analysis , Models, Statistical , Psychology/statistics & numerical data , Algorithms , Computer Heuristics , Computer Simulation , Educational Measurement/statistics & numerical data , Humans , Neuropsychological Tests/statistics & numerical data , Psychometrics/statistics & numerical data
7.
Psychol Methods ; 24(6): 735-753, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31589062

ABSTRACT

During the past 5 to 10 years, an estimation method known as eLasso has been used extensively to produce symptom networks (or, more precisely, symptom dependence graphs) from binary data in psychopathological research. The eLasso method is based on a particular type of Ising model that corresponds to binary pairwise Markov random fields, and its popularity is due, in part, to an efficient estimation process that is based on a series of l1-regularized logistic regressions. In this article, we offer an unprecedented critique of the Ising model and eLasso. We provide a careful assessment of the conditions that underlie the Ising model as well as specific limitations associated with the eLasso estimation algorithm. This assessment leads to serious concerns regarding the implementation of eLasso in psychopathological research. Some potential strategies for eliminating or, at least, mitigating these concerns include (a) the use of partitioning or mixture modeling to account for unobserved heterogeneity in the sample of respondents, and (b) the use of co-occurrence measures for symptom similarity to either replace or supplement the covariance/correlation measure associated with eLasso. Two psychopathological data sets are used to highlight the concerns that are raised in the critique. (PsycINFO Database Record (c) 2019 APA, all rights reserved).


Subject(s)
Algorithms , Behavioral Symptoms , Biomedical Research , Models, Statistical , Psychopathology , Biomedical Research/standards , Humans , Psychopathology/standards
8.
Br J Math Stat Psychol ; 72(1): 155-182, 2019 02.
Article in English | MEDLINE | ID: mdl-29633235

ABSTRACT

Affinity propagation is a message-passing-based clustering procedure that has received widespread attention in domains such as biological science, physics, and computer science. However, its implementation in psychology and related areas of social science is comparatively scant. In this paper, we describe the basic principles of affinity propagation, its relationship to other clustering problems, and the types of data for which it can be used for cluster analysis. More importantly, we identify the strengths and weaknesses of affinity propagation as a clustering tool in general and highlight potential opportunities for its use in psychological research. Numerical examples are provided to illustrate the method.


Subject(s)
Algorithms , Cluster Analysis , Pattern Recognition, Automated/methods , Psychology/methods , Computer Simulation , Humans , Research , Research Design
9.
Behav Res Methods ; 50(6): 2256-2266, 2018 12.
Article in English | MEDLINE | ID: mdl-29218590

ABSTRACT

The problem of comparing the agreement of two n × n matrices has a variety of applications in experimental psychology. A well-known index of agreement is based on the sum of the element-wise products of the matrices. Although less familiar to many researchers, measures of agreement based on within-row and/or within-column gradients can also be useful. We provide a suite of MATLAB programs for computing agreement indices and performing matrix permutation tests of those indices. Programs for computing exact p-values are available for small matrices, whereas resampling programs for approximate p-values are provided for larger matrices.


Subject(s)
Behavioral Research/statistics & numerical data , Data Interpretation, Statistical , Models, Statistical , Software , Humans
10.
Br J Math Stat Psychol ; 71(2): 287-299, 2018 05.
Article in English | MEDLINE | ID: mdl-29159803

ABSTRACT

Two expectations of the adjusted Rand index (ARI) are compared. It is shown that the expectation derived by Morey and Agresti (1984, Educational and Psychological Measurement, 44, 33) under the multinomial distribution to approximate the exact expectation from the hypergeometric distribution (Hubert & Arabie, 1985, Journal of Classification, 2, 193) provides a poor approximation, and, in some cases, the difference between the two expectations can increase with the sample size. Proofs concerning the minimum and maximum difference between the two expectations are provided, and it is shown through simulation that the ARI can differ significantly depending on which expectation is used. Furthermore, when compared in a hypothesis testing framework, multinomial approximation overly favours the null hypothesis.


Subject(s)
Cluster Analysis , Computer Simulation , Models, Psychological , Psychometrics/methods , Algorithms , Humans , Models, Statistical , Reproducibility of Results , Sample Size , Software
11.
Multivariate Behav Res ; 53(1): 57-73, 2018.
Article in English | MEDLINE | ID: mdl-29220584

ABSTRACT

Cohen's κ, a similarity measure for categorical data, has since been applied to problems in the data mining field such as cluster analysis and network link prediction. In this paper, a new application is examined: community detection in networks. A new algorithm is proposed that uses Cohen's κ as a similarity measure for each pair of nodes; subsequently, the κ values are then clustered to detect the communities. This paper defines and tests this method on a variety of simulated and real networks. The results are compared with those from eight other community detection algorithms. Results show this new algorithm is consistently among the top performers in classifying data points both on simulated and real networks. Additionally, this is one of the broadest comparative simulations for comparing community detection algorithms to date.


Subject(s)
Algorithms , Computer Communication Networks , Social Support , Cluster Analysis , Humans
12.
J Abnorm Psychol ; 126(7): 1000-1010, 2017 10.
Article in English | MEDLINE | ID: mdl-29106283

ABSTRACT

Forbes, Wright, Markon, and Krueger (2017) make a compelling case for proceeding cautiously with respect to the overinterpretation and dissemination of results using the increasingly popular approach of creating "networks" from co-occurrences of psychopathology symptoms. We commend the authors on their initial investigation and their utilization of cross-validation techniques in an effort to capture the stability of a variety of network estimation methods. Such techniques get at the heart of establishing "reproducibility," an increasing focus of concern in both psychology (e.g., Pashler & Wagenmakers, 2012) and science more generally (e.g., Baker, 2016). However, as we will show, the problem is likely worse (or at least more complicated) than they initially indicated. Specifically, for multivariate binary data, the marginal distributions enforce a large degree of structure on the data. We show that some expected measurements-such as commonly used centrality statistics-can have substantially higher values than what would usually be expected. As such, we propose a nonparametric approach to generate confidence intervals through Monte Carlo simulation. We apply the proposed methodology to the National Comorbidity Survey - Replication, provided by Forbes et al., finding that the many of the results are indistinguishable from what would be expected by chance. Further, we discuss the problem of multiple testing and potential issues of applying methods developed for 1-mode networks (e.g., ties within a single set of observations) to 2-mode networks (e.g., ties between 2 distinct sets of entities). When taken together, these issues indicate that the psychometric network models should be employed with extreme caution and interpreted guardedly. (PsycINFO Database Record


Subject(s)
Psychopathology , Research Design , Comorbidity , Humans , Reproducibility of Results
13.
Br J Math Stat Psychol ; 70(1): 1-24, 2017 Feb.
Article in English | MEDLINE | ID: mdl-28130935

ABSTRACT

The emergence of Gaussian model-based partitioning as a viable alternative to K-means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model-based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple-restart (multistart) approach commonly used for K-means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart-ILS procedures for minimizing a very general model-based criterion that assumes no restrictions on cluster size or within-group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10-min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained.


Subject(s)
Algorithms , Cluster Analysis , Data Interpretation, Statistical , Models, Statistical , Normal Distribution , Computer Simulation
14.
Psychol Methods ; 22(3): 563-580, 2017 09.
Article in English | MEDLINE | ID: mdl-27607543

ABSTRACT

The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record


Subject(s)
Cluster Analysis , Models, Psychological , Models, Statistical , Algorithms , Factor Analysis, Statistical , Humans , Research Design
15.
Behav Res Methods ; 49(1): 282-293, 2017 02.
Article in English | MEDLINE | ID: mdl-26721666

ABSTRACT

Mixture modeling is a popular technique for identifying unobserved subpopulations (e.g., components) within a data set, with Gaussian (normal) mixture modeling being the form most widely used. Generally, the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process. The most common estimation procedure is maximum likelihood via the expectation-maximization (EM) algorithm. Like many approaches for identifying subpopulations, finite mixture modeling can suffer from locally optimal solutions, and the final parameter estimates are dependent on the initial starting values of the EM algorithm. Initial values have been shown to significantly impact the quality of the solution, and researchers have proposed several approaches for selecting the set of starting values. Five techniques for obtaining starting values that are implemented in popular software packages are compared. Their performances are assessed in terms of the following four measures: (1) the ability to find the best observed solution, (2) settling on a solution that classifies observations correctly, (3) the number of local solutions found by each technique, and (4) the speed at which the start values are obtained. On the basis of these results, a set of recommendations is provided to the user.


Subject(s)
Finite Element Analysis , Normal Distribution , Algorithms , Models, Theoretical , Probability
16.
Multivariate Behav Res ; 51(4): 466-81, 2016.
Article in English | MEDLINE | ID: mdl-27494191

ABSTRACT

It is common knowledge that mixture models are prone to arrive at locally optimal solutions. Typically, researchers are directed to utilize several random initializations to ensure that the resulting solution is adequate. However, it is unknown what factors contribute to a large number of local optima and whether these coincide with the factors that reduce the accuracy of a mixture model. A real-data illustration and a series of simulations are presented that examine the effect of a variety of data structures on the propensity of local optima and the classification quality of the resulting solution. We show that there is a moderately strong relationship between a solution that has a high proportion of local optima and one that is poorly classified.


Subject(s)
Algorithms , Models, Statistical , Computer Simulation
17.
Br J Math Stat Psychol ; 69(2): 194-213, 2016 May.
Article in English | MEDLINE | ID: mdl-27027582

ABSTRACT

The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch-and-bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch-and-bound approach.


Subject(s)
Algorithms , Data Interpretation, Statistical , Likelihood Functions , Models, Statistical , Computer Simulation
18.
Psychol Methods ; 21(2): 261-72, 2016 06.
Article in English | MEDLINE | ID: mdl-26881693

ABSTRACT

For 30 years, the adjusted Rand index has been the preferred method for comparing 2 partitions (e.g., clusterings) of a set of observations. Although the index is widely used, little is known about its variability. Herein, the variance of the adjusted Rand index (Hubert & Arabie, 1985) is provided and its properties are explored. It is shown that a normal approximation is appropriate across a wide range of sample sizes and varying numbers of clusters. Further, it is shown that confidence intervals based on the normal distribution have desirable levels of coverage and accuracy. Finally, the first power analysis evaluating the ability to detect differences between 2, different adjusted Rand indices is provided. (PsycINFO Database Record


Subject(s)
Cluster Analysis , Confidence Intervals , Mathematical Computing , Sample Size
19.
Behav Res Methods ; 48(2): 487-502, 2016 06.
Article in English | MEDLINE | ID: mdl-25899042

ABSTRACT

An asymmetric one-mode data matrix has rows and columns that correspond to the same set of objects. However, the roles of the objects frequently differ for the rows and the columns. For example, in a visual alphabetic confusion matrix from an experimental psychology study, both the rows and columns pertain to letters of the alphabet. Yet the rows correspond to the presented stimulus letter, whereas the columns refer to the letter provided as the response. Other examples abound in psychology, including applications related to interpersonal interactions (friendship, trust, information sharing) in social and developmental psychology, brand switching in consumer psychology, journal citation analysis in any discipline (including quantitative psychology), and free association tasks in any subarea of psychology. When seeking to establish a partition of the objects in such applications, it is overly restrictive to require the partitions of the row and column objects to be identical, or even the numbers of clusters for the row and column objects to be the same. This suggests the need for a biclustering approach that simultaneously establishes separate partitions of the row and column objects. We present and compare several approaches for the biclustering of one-mode matrices using data sets from the empirical literature. A suite of MATLAB m-files for implementing the procedures is provided as a Web supplement with this article.


Subject(s)
Behavioral Research/methods , Cluster Analysis , Humans
20.
Psychometrika ; 80(4): 949-67, 2015 Dec.
Article in English | MEDLINE | ID: mdl-25850618

ABSTRACT

The monotone homogeneity model (MHM-also known as the unidimensional monotone latent variable model) is a nonparametric IRT formulation that provides the underpinning for partitioning a collection of dichotomous items to form scales. Ellis (Psychometrika 79:303-316, 2014, doi: 10.1007/s11336-013-9341-5 ) has recently derived inequalities that are implied by the MHM, yet require only the bivariate (inter-item) correlations. In this paper, we incorporate these inequalities within a mathematical programming formulation for partitioning a set of dichotomous scale items. The objective criterion of the partitioning model is to produce clusters of maximum cardinality. The formulation is a binary integer linear program that can be solved exactly using commercial mathematical programming software. However, we have also developed a standalone branch-and-bound algorithm that produces globally optimal solutions. Simulation results and a numerical example are provided to demonstrate the proposed method.


Subject(s)
Models, Statistical , Statistics, Nonparametric , Algorithms , Psychometrics/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL