Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Math Biosci Eng ; 21(3): 3631-3651, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38549299

RESUMO

Time series clustering is a usual task in many different areas. Algorithms such as K-means and model-based clustering procedures are used relating to multivariate assumptions on the datasets, as the consideration of Euclidean distances, or a probabilistic distribution of the observed variables. However, in many cases the observed time series are of unequal length and/or there is missing data or, simply, the time periods observed for the series are not comparable between them, which does not allow the direct application of these methods. In this framework, dynamic time warping is an advisable and well-known elastic dissimilarity procedure, in particular when the analysis is accomplished in terms of the shape of the time series. In relation to a dissimilarity matrix, K-means clustering can be performed using a particular procedure based on classical multidimensional scaling in full dimension, which can result in a clustering problem in high dimensionality for large sample sizes. In this paper, we propose a procedure robust to dimensionality reduction, based on an auxiliary configuration estimated from the squared dynamic time warping dissimilarities, using an alternating least squares procedure. The performance of the model is compared to that obtained using classical multidimensional scaling, as well as to that of model-based clustering using this related auxiliary linear projection. An extensive Monte Carlo procedure is employed to analyze the performance of the proposed method in which real and simulated datasets are considered. The results obtained indicate that the proposed K-means procedure, in general, slightly improves the one based on the classical configuration, both being robust in reduced dimensionality, making it advisable for large datasets. In contrast, model-based clustering in the classical projection is greatly affected by high dimensionality, offering worse results than K-means, even in reduced dimension.

2.
Br J Math Stat Psychol ; 77(2): 356-374, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38213088

RESUMO

Clustering and spatial representation methods are often used in combination, to analyse preference ratings when a large number of individuals and/or object is involved. When analysed under an unfolding model, row-conditional linear transformations are usually most appropriate when the goal is to determine clusters of individuals with similar preferences. However, a significant problem with transformations that include both slope and intercept is the occurrence of degenerate solutions. In this paper, we propose a least squares unfolding method that performs clustering of individuals while simultaneously estimating the location of cluster centres and object locations in low-dimensional space. The method is based on minimising the mean squared centred residuals of the preference ratings with respect to the distances between cluster centres and object locations. At the same time, the distances are row-conditionally transformed with optimally estimated slope parameters. It is computationally efficient for large datasets, and does not suffer from the appearance of degenerate solutions. The performance of the method is analysed in an extensive Monte Carlo experiment. It is illustrated for a real data set and the results are compared with those obtained using a two-step clustering and unfolding procedure.


Assuntos
Análise por Conglomerados
3.
Stat Med ; 42(27): 4897-4916, 2023 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-37621084

RESUMO

Biometrical sciences and disease diagnosis in particular, are often concerned with the analysis of associations for cross-classified data, for which distance association models give us a graphical interpretation for non-sparse matrices with a low number of categories. In this framework, usually binary exploratory and response variables are present, with analysis based on individual profiles being of great interest. For saturated models, we show the usual linear relationship for log-linear models is preserved in full dimension for the distance association parameterization. This enables a two-step procedure to facilitate the analysis and the interpretation of associations in terms of unfolding after the overall and main effects are removed. The proposed procedure can deal with cross-classified data for profiles by binary variables, and it is easy to implement using traditional statistical software. For disease diagnosis, the problems of a degenerate solution in the unfolding representation, and that of determining significant differences between the profile locations are addressed. A hypothesis test of independence based on odds ratio is considered. Furthermore, a procedure is proposed to determine the causes of the significance of the test, avoiding the problem of error propagation. The equivalence between a test for equality of odds ratio pairs and the test for equality of location for two profiles in the unfolding representation in the disease diagnosis is shown. The results have been applied to a real example on the diagnosis of coronary disease, relating the odds ratios with performance parameters of the diagnostic test.


Assuntos
Biometria , Software , Humanos , Modelos Lineares , Razão de Chances , Biometria/métodos
4.
Stat Methods Med Res ; 32(4): 760-772, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36789779

RESUMO

Survey calibration is a widely used method to estimate the population mean or total score of a target variable, particularly in medical research. In this procedure, auxiliary information related to the variable of interest is used to recalibrate the estimation weights. However, when the auxiliary information includes qualitative variables, traditional calibration techniques may be not feasible or the optimisation procedure may fail. In this article, we propose the use of linear calibration in conjunction with a multidimensional scaling-based set of continuous, uncorrelated auxiliary variables along with a suitable metric in a distance-based regression framework. The calibration weights are estimated using a projection of the auxiliary information on a low-dimensional Euclidean space. The approach becomes one of the linear calibration with quantitative variables avoiding the usual computational problems in the presence of qualitative auxiliary information. The new variables preserve the underlying assumption in linear calibration of a linear relationship between the auxiliary and target variables, and therefore the optimal properties of the linear calibration method remain true. The behaviour of this approach is examined using a Monte Carlo procedure and its value is illustrated by analysing real data sets and by comparing its performance with that of traditional calibration procedures.


Assuntos
Análise de Escalonamento Multidimensional , Calibragem , Método de Monte Carlo
5.
Multivariate Behav Res ; 57(4): 679-699, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33843387

RESUMO

In this paper a simple but effective procedure to avoid degeneracies in ordinal Unfolding for preference rank data based on the Kemeny distance is proposed. Considering Unfolding as a particular MDS procedure with missing within-set proximities, unknown proximities are first estimated using correlations related to the Kemeny distance, and then the complete proximity matrix is analyzed in a standard MDS framework. A simulation study shows that our proposal is able to both recover the order of the preferences and reproduce the position of both rankings and objects in a geometrical space. Several applications on real data sets show that our procedure returns non-degenerate Unfolding solutions.


Assuntos
Algoritmos , Simulação por Computador , Matemática
6.
Psychometrika ; 86(2): 489-513, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34008128

RESUMO

In this article, we analyse the usefulness of multidimensional scaling in relation to performing K-means clustering on a dissimilarity matrix, when the dimensionality of the objects is unknown. In this situation, traditional algorithms cannot be used, and so K-means clustering procedures are being performed directly on the basis of the observed dissimilarity matrix. Furthermore, the application of criteria originally formulated for two-mode data sets to determine the number of clusters depends on their possible reformulation in a one-mode situation. The linear invariance property in K-means clustering for squared dissimilarities, together with the use of multidimensional scaling, is investigated to determine the cluster membership of the observations and to address the problem of selecting the number of clusters in K-means for a dissimilarity matrix. In particular, we analyse the performance of K-means clustering on the full dimensional scaling configuration and on the equivalently partitioned configuration related to a suitable translation of the squared dissimilarities. A Monte Carlo experiment is conducted in which the methodology examined is compared with the results obtained by procedures directly applicable to a dissimilarity matrix.


Assuntos
Algoritmos , Análise de Escalonamento Multidimensional , Análise por Conglomerados , Método de Monte Carlo , Psicometria
7.
Multivariate Behav Res ; 55(3): 329-343, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31352798

RESUMO

Distance association models constitute a useful tool for the analysis and graphical representation of cross-classified data in which distances between points inversely describe the association between two categorical variables. When the number of cells is large and the data counts result in sparse tables, the combination of clustering and representation reduces the number of parameters to be estimated and facilitates interpretation. In this article, a latent block distance-association model is proposed to apply block clustering to the outcomes of two categorical variables while the cluster centers are represented in a low dimensional space in terms of a distance-association model. This model is particularly useful for contingency tables in which both the rows and the columns are characterized as profiles of sets of response variables. The parameters are estimated under a Poisson sampling scheme using a generalized EM algorithm. The performance of the model is tested in a Monte Carlo experiment, and an empirical data set is analyzed to illustrate the model.


Assuntos
Interpretação Estatística de Dados , Análise de Classes Latentes , Modelos Estatísticos , Algoritmos , Humanos
8.
Psychometrika ; 82(2): 275-294, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28194550

RESUMO

One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.


Assuntos
Algoritmos , Seleção de Pacientes , Psicometria , Análise por Conglomerados , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA