Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Environ Res ; 216(Pt 2): 114519, 2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36252833

RESUMO

Soil attributes and their environmental drivers exhibit different patterns in different geographical directions, along with distinct regional characteristics, which may have important effects on substance migration and transformation such as organic matter and soil elements or the environmental impacts of pollutants. Therefore, regional soil characteristics should be considered in the process of regionalization for environmental management. However, no comprehensive evaluation or systematic classification of the natural soil environment has been established for China. Here, we established an index system for natural soil environmental regionalization (NSER) by combining literature data obtained based on bibliometrics with the analytic hierarchy process (AHP). Based on the index system, we collected spatial distribution data for 14 indexes at the national scale. In addition, three clustering algorithms-self-organizing feature mapping (SOFM), fuzzy c-means (FCM) and k-means (KM)-were used to classify and define the natural soil environment. We imported four cluster validity indexes (CVI) to evaluate different models: Davies-Bouldin index (DB), Silhouette index (Sil) and Calinski-Harabasz index (CH) for FCM and KM, clustering quality index (CQI) for SOFM. Analysis and comparison of the results showed that when the number of clusters was 13, the FCM clustering algorithm achieved the optimal clustering results (DB = 1.16, Sil = 0.78, CH = 6.77 × 106), allowing the natural soil environment of China to be divided into 12 regions with distinct characteristics. Our study provides a set of comprehensive scientific research methods for regionalization research based on spatial data, it has important reference value for improving soil environmental management based on local conditions in China.


Assuntos
Algoritmos , Solo , Análise por Conglomerados , Geografia , China , Lógica Fuzzy
2.
Sensors (Basel) ; 23(6)2023 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-36992030

RESUMO

Human movement anomalies in indoor spaces commonly involve urgent situations, such as security threats, accidents, and fires. This paper proposes a two-phase framework for detecting indoor human trajectory anomalies based on density-based spatial clustering of applications with noise (DBSCAN). The first phase of the framework groups datasets into clusters. In the second phase, the abnormality of a new trajectory is checked. A new metric called the longest common sub-sequence using indoor walking distance and semantic label (LCSS_IS) is proposed to calculate the similarity between trajectories, extending from the longest common sub-sequence (LCSS). Moreover, a DBSCAN cluster validity index (DCVI) is proposed to improve the trajectory clustering performance. The DCVI is used to choose the epsilon parameter for DBSCAN. The proposed method is evaluated using two real trajectory datasets: MIT Badge and sCREEN. The experimental results show that the proposed method effectively detects human trajectory anomalies in indoor spaces. With the MIT Badge dataset, the proposed method achieves 89.03% in terms of F1-score for hypothesized anomalies and above 93% for all synthesized anomalies. In the sCREEN dataset, the proposed method also achieves impressive results in F1-score on synthesized anomalies: 89.92% for rare location visit anomalies (τ = 0.5) and 93.63% for other anomalies.


Assuntos
Algoritmos , Humanos , Análise por Conglomerados
3.
Sensors (Basel) ; 23(7)2023 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-37050769

RESUMO

Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space.

4.
Entropy (Basel) ; 22(10)2020 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-33286864

RESUMO

The article presents both methods of clustering and outlier detection in complex data, such as rule-based knowledge bases. What distinguishes this work from others is, first, the application of clustering algorithms to rules in domain knowledge bases, and secondly, the use of outlier detection algorithms to detect unusual rules in knowledge bases. The aim of the paper is the analysis of using four algorithms for outlier detection in rule-based knowledge bases: Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), K-MEANS, and SMALLCLUSTERS. The subject of outlier mining is very important nowadays. Outliers in rules If-Then mean unusual rules, which are rare in comparing to others and should be explored by the domain expert as soon as possible. In the research, the authors use the outlier detection methods to find a given number of outliers in rules (1%, 5%, 10%), while in small groups, the number of outliers covers no more than 5% of the rule cluster. Subsequently, the authors analyze which of seven various quality indices, which they use for all rules and after removing selected outliers, improve the quality of rule clusters. In the experimental stage, the authors use six different knowledge bases. The best results (the most often the clusters quality was improved) are achieved for two outlier detection algorithms LOF and COF.

5.
BMC Evol Biol ; 18(1): 48, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29621975

RESUMO

BACKGROUND: Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. RESULTS: We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Calinski-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. CONCLUSIONS: The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while providing similar or better clustering results in most cases. This makes it particularly well suited for the analysis of large genomic and phylogenetic datasets.


Assuntos
Algoritmos , Genômica/métodos , Filogenia , Archaea/metabolismo , Análise por Conglomerados , Simulação por Computador , Transferência Genética Horizontal/genética , Proteínas Ribossômicas/metabolismo , Especificidade da Espécie
6.
J Neurosci Methods ; 337: 108651, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32109439

RESUMO

BACKGROUND: Clustering analysis is employed in brain dynamic functional connectivity (dFC) to cluster the data into a set of dynamic states. These states correspond to different patterns of functional connectivity that iterate through time. Although several cluster validity index (CVI) methods to determine the best clustering partition exists, the appropriateness of methods to apply in the case of dynamic connectivity analysis has not been determined. NEW METHOD: Currently employed indexes do not provide a crisp answer on what is the best number of clusters. In addition, there is a lack of CVI testing in the context of dFC data. This work tests a comprehensive set of twenty four cluster validity indexes applied to addiction data and suggest the best ones for clustering dynamic functional connectivity. RESULTS: Out of the twenty four considered CVIs, Davies-Bouldin and Ray-Turi were the most suitable methods to find the number of clusters in both simulation and real data. The solution for these two CVIs is to find a local minimum critical point, which can be automated using computational algorithms. COMPARISON WITH EXISTING METHODS: Elbow-Criterion, Silhouette and GAP-Statistic methods have been widely used in dFC studies. These methods are included among the tested CVIs where the performances of all twenty four CVIs are compared. CONCLUSIONS: Davies-Bouldin and Ray-Turi CVIs showed better performance among a group of twenty four CVIs in determining the number of clusters to use in dFC analysis.


Assuntos
Mapeamento Encefálico , Encéfalo , Algoritmos , Encéfalo/diagnóstico por imagem , Análise por Conglomerados , Simulação por Computador
7.
Genes (Basel) ; 10(8)2019 08 13.
Artigo em Inglês | MEDLINE | ID: mdl-31412637

RESUMO

Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Animais , Análise por Conglomerados , Lógica Fuzzy , Perfilação da Expressão Gênica/normas , Camundongos , Proteína Ribossômica L3 , Análise de Sequência de RNA/normas , Análise de Célula Única/normas
8.
J Adv Res ; 16: 15-23, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30899585

RESUMO

A Gaussian mixture model (GMM)-based classification technique is employed for a quantitative global assessment of brain tissue changes by using pixel intensities and contrast generated by b-values in diffusion tensor imaging (DTI). A hemisphere approach is also proposed. A GMM identifies the variability in the main brain tissues at a macroscopic scale rather than searching for tumours or affected areas. The asymmetries of the mixture distributions between the hemispheres could be used as a sensitive, faster tool for early diagnosis. The k-means algorithm optimizes the parameters of the mixture distributions and ensures that the global maxima of the likelihood functions are determined. This method has been illustrated using 18 sub-classes of DTI data grouped into six levels of diffusion weighting (b = 0; 250; 500; 750; 1000 and 1250 s/mm2) and three main brain tissues. These tissues belong to three subjects, i.e., healthy, multiple haemorrhage areas in the left temporal lobe and ischaemic stroke. The mixing probabilities or weights at the class level are estimated based on the sub-class-level mixing probability estimation. Furthermore, weighted Euclidean distance and multiple correlation analysis are applied to analyse the dissimilarity of mixing probabilities between hemispheres and subjects. The silhouette data evaluate the objective quality of the clustering. By using a GMM in the present study, we establish an important variability in the mixing probability associated with white matter and grey matter between the left and right hemispheres.

9.
PeerJ ; 6: e5416, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30310731

RESUMO

Model-free methods are widely used for the processing of brain fMRI data collected under natural stimulations, sleep, or rest. Among them is the popular fuzzy c-mean algorithm, commonly combined with cluster validity (CV) indices to identify the 'true' number of clusters (components), in an unsupervised way. CV indices may however reveal different optimal c-partitions for the same fMRI data, and their effectiveness can be hindered by the high data dimensionality, the limited signal-to-noise ratio, the small proportion of relevant voxels, and the presence of artefacts or outliers. Here, the author investigated the behaviour of seven robust CV indices. A new CV index that incorporates both compactness and separation measures is also introduced. Using both artificial and real fMRI data, the findings highlight the importance of looking at the behavior of different compactness and separation measures, defined here as building blocks of CV indices, to depict a full description of the data structure, in particular when no agreement is found between CV indices. Overall, for fMRI, it makes sense to relax the assumption that only one unique c-partition exists, and appreciate that different c-partitions (with different optimal numbers of clusters) can be useful explanations of the data, given the hierarchical organization of many brain networks.

10.
Med Phys ; 44(1): 209-220, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28102943

RESUMO

PURPOSE: Dynamic [18 F]fluoro-ethyl-L-tyrosine positron emission tomography ([18 F]FET-PET) is used to identify tumor lesions for radiotherapy treatment planning, to differentiate glioma recurrence from radiation necrosis and to classify gliomas grading. To segment different regions in the brain k-means cluster analysis can be used. The main disadvantage of k-means is that the number of clusters must be pre-defined. In this study, we therefore compared different cluster validity indices for automated and reproducible determination of the optimal number of clusters based on the dynamic PET data. METHODS: The k-means algorithm was applied to dynamic [18 F]FET-PET images of 8 patients. Akaike information criterion (AIC), WB, I, modified Dunn's and Silhouette indices were compared on their ability to determine the optimal number of clusters based on requirements for an adequate cluster validity index. To check the reproducibility of k-means, the coefficients of variation CVs of the objective function values OFVs (sum of squared Euclidean distances within each cluster) were calculated using 100 random centroid initialization replications RCI100 for 2 to 50 clusters. k-means was performed independently on three neighboring slices containing tumor for each patient to investigate the stability of the optimal number of clusters within them. To check the independence of the validity indices on the number of voxels, cluster analysis was applied after duplication of a slice selected from each patient. CVs of index values were calculated at the optimal number of clusters using RCI100 to investigate the reproducibility of the validity indices. To check if the indices have a single extremum, visual inspection was performed on the replication with minimum OFV from RCI100 . RESULTS: The maximum CV of OFVs was 2.7 × 10-2 from all patients. The optimal number of clusters given by modified Dunn's and Silhouette indices was 2 or 3 leading to a very poor segmentation. WB and I indices suggested in median 5, [range 4-6] and 4, [range 3-6] clusters, respectively. For WB, I, modified Dunn's and Silhouette validity indices the suggested optimal number of clusters was not affected by the number of the voxels. The maximum coefficient of variation of WB, I, modified Dunn's, and Silhouette validity indices were 3 × 10-2 , 1, 2 × 10-1 and 3 × 10-3 , respectively. WB-index showed a single global maximum, whereas the other indices showed also local extrema. CONCLUSION: From the investigated cluster validity indices, the WB-index is best suited for automated determination of the optimal number of clusters for [18 F]FET-PET brain images for the investigated image reconstruction algorithm and the used scanner: it yields meaningful results allowing better differentiation of tissues with higher number of clusters, it is simple, reproducible and has an unique global minimum.


Assuntos
Encéfalo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Tomografia por Emissão de Pósitrons , Tirosina/análogos & derivados , Algoritmos , Diagnóstico Diferencial , Glioma/diagnóstico por imagem , Humanos , Recidiva , Reprodutibilidade dos Testes
11.
Comput Biol Med ; 89: 31-43, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28783536

RESUMO

One of the crucial problems in the field of functional genomics is to identify a set of genes which are responsible for a particular cellular mechanism. The current work explores the usage of a multi-objective optimization based genetic clustering technique to classify genes into groups with respect to their functional similarities and biological relevance. Our contribution is two-fold: firstly a new quality measure to compute the goodness of gene-clusters namely protein-protein interaction confidence score is developed. This utilizes the confidence scores of the protein-protein interaction networks to measure the similarity between genes of a particular cluster with respect to their biochemical protein products. Secondly, a multi-objective based clustering approach is developed which intelligently uses integrated information of expression values of microarray dataset and protein-protein interaction confidence scores to select both statistically and biologically relevant genes. For that very purpose, some biological cluster validity indices, viz. biological homogeneity index and protein-protein interaction confidence score, along with two traditional internal cluster validity indices, viz. fuzzy partition coefficient and Pakhira-Bandyopadhyay-Maulik-index, are simultaneously optimized during the clustering process. Experimental results on three real-life gene expression datasets show that the addition of new objective capturing protein-protein interaction information aids in clustering the genes as compared to the existing techniques. The observations are further supported by biological and statistical significance tests.


Assuntos
Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Análise em Microsséries , Modelos Biológicos , Humanos
12.
Comput Biol Med ; 54: 61-71, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25212119

RESUMO

In this paper, we present a new method to compare and improve algorithms for feature detection in neonatal EEG. The method is based on the algorithm׳s ability to compute accurate statistics to predict the results of EEG visual analysis. This method is implemented inside a Java software called EEGDiag, as part of an e-health Web portal dedicated to neonatal EEG. EEGDiag encapsulates a component-based implementation of the detection algorithms called analyzers. Each analyzer is defined by a list of modules executed sequentially. As the libraries of modules are intended to be enriched by its users, we developed a process to evaluate the performance of new modules and analyzers using a database of expertized and categorized EEGs. The evaluation is based on the Davies-Bouldin index (DBI) which measures the quality of cluster separation, so that it will ease the building of classifiers on risk categories. For the first application we tested this method on the detection of interburst intervals (IBI) using a database of 394 EEG acquired on premature newborns. We have defined a class of IBI detectors based on a threshold of the standard deviation on contiguous short time windows, inspired by previous work. Then we determine which detector and what threshold values are the best regarding DBI, as well as the robustness of this choice. This method allows us to make counter-intuitive choices, such as removing the 50 Hz filter (power supply) to save time.


Assuntos
Algoritmos , Encefalopatias/diagnóstico , Encefalopatias/embriologia , Diagnóstico por Computador/métodos , Eletroencefalografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Diagnóstico Pré-Natal/métodos , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
13.
Springerplus ; 3: 465, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25279282

RESUMO

In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities.

14.
J Med Signals Sens ; 3(1): 22-30, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24083134

RESUMO

In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA