Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Physica A ; 5852022 Jan 01.
Article in English | MEDLINE | ID: mdl-34737487

ABSTRACT

The Automatic Quasi-clique Merger algorithm is a new algorithm adapted from early work published under the name QCM (introduced by Ou and Zhang in 2007). The AQCM algorithm performs hierarchical clustering in any data set for which there is an associated similarity measure quantifying the similarity of any data i and data j. Importantly, the method exhibits two valuable performance properties: 1) the ability to automatically return either a larger or smaller number of clusters depending on the inherent properties of the data rather than on a parameter. 2) the ability to return a very large number of relatively small clusters automatically when such clusters are reasonably well defined in a data set. In this work we present the general idea of a quasi-clique agglomerative approach, provide the full details of the mathematical steps of the AQCM algorithm, and explain some of the motivation behind the new methodology. The main achievement of the new methodology is that the agglomerative process now unfolds adaptively according to the inherent structure unique to a given data set, and this happens without the time-costly parameter adjustment that drove the previous QCM algorithm. For this reason we call the new algorithm automatic. We provide a demonstration of the algorithm's performance at the task of community detection in a social media network of 22,900 nodes.

2.
ScientificWorldJournal ; 2014: 492461, 2014.
Article in English | MEDLINE | ID: mdl-24778585

ABSTRACT

This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.


Subject(s)
Cluster Analysis , Crime/statistics & numerical data , Models, Theoretical , Crime/legislation & jurisprudence , Virginia
3.
ScientificWorldJournal ; 2013: 368568, 2013.
Article in English | MEDLINE | ID: mdl-24282380

ABSTRACT

Within graph theory and network analysis, centrality of a vertex measures the relative importance of a vertex within a graph. The centrality plays key role in network analysis and has been widely studied using different methods. Inspired by the idea of vertex centrality, a novel centrality guided clustering (CGC) is proposed in this paper. Different from traditional clustering methods which usually choose the initial center of a cluster randomly, the CGC clustering algorithm starts from a "LEADER"--a vertex with the highest centrality score--and a new "member" is added into the same cluster as the "LEADER" when some criterion is satisfied. The CGC algorithm also supports overlapping membership. Experiments on three benchmark social network data sets are presented and the results indicate that the proposed CGC algorithm works well in social network clustering.


Subject(s)
Algorithms , Cluster Analysis , Game Theory , Leadership , Models, Theoretical , Population Dynamics , Social Support , Computer Simulation , Humans
4.
ScientificWorldJournal ; 2012: 104269, 2012.
Article in English | MEDLINE | ID: mdl-22619571

ABSTRACT

Sequence comparison is a primary technique for the analysis of DNA sequences. In order to make quantitative comparisons, one devises mathematical descriptors that capture the essence of the base composition and distribution of the sequence. Alignment methods and graphical techniques (where each sequence is represented by a curve in high-dimension Euclidean space) have been used popularly for a long time. In this contribution we will introduce a new nongraphical and nonalignment approach based on the frequencies of the dinucleotide XY in DNA sequences. The most important feature of this method is that it not only identifies adjacent XY pairs but also nonadjacent XY ones where X and Y are separated by some number of nucleotides. This methodology preserves information in DNA sequence that is ignored by other methods. We test our method on the coding regions of exon-1 of ß-globin for 11 species, and the utility of this new method is demonstrated.


Subject(s)
Nucleotides/chemistry , Sequence Analysis, DNA , Animals , Base Sequence , Exons , Humans , Molecular Sequence Data , beta-Globins/chemistry , beta-Globins/genetics
5.
BMC Bioinformatics ; 13 Suppl 2: S12, 2012 Mar 13.
Article in English | MEDLINE | ID: mdl-22536863

ABSTRACT

BACKGROUND: Using gene co-expression analysis, researchers were able to predict clusters of genes with consistent functions that are relevant to cancer development and prognosis. We applied a weighted gene co-expression network (WGCN) analysis algorithm on glioblastoma multiforme (GBM) data obtained from the TCGA project and predicted a set of gene co-expression networks which are related to GBM prognosis. METHODS: We modified the Quasi-Clique Merger algorithm (QCM algorithm) into edge-covering Quasi-Clique Merger algorithm (eQCM) for mining weighted sub-network in WGCN. Each sub-network is considered a set of features to separate patients into two groups using K-means algorithm. Survival times of the two groups are compared using log-rank test and Kaplan-Meier curves. Simulations using random sets of genes are carried out to determine the thresholds for log-rank test p-values for network selection. Sub-networks with p-values less than their corresponding thresholds were further merged into clusters based on overlap ratios (>50%). The functions for each cluster are analyzed using gene ontology enrichment analysis. RESULTS: Using the eQCM algorithm, we identified 8,124 sub-networks in the WGCN, out of which 170 sub-networks show p-values less than their corresponding thresholds. They were then merged into 16 clusters. CONCLUSIONS: We identified 16 gene clusters associated with GBM prognosis using the eQCM algorithm. Our results not only confirmed previous findings including the importance of cell cycle and immune response in GBM, but also suggested important epigenetic events in GBM development and prognosis.


Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Glioblastoma/genetics , Adult , Aged , Algorithms , Female , Glioblastoma/metabolism , Glioblastoma/mortality , Humans , Kaplan-Meier Estimate , Male , Middle Aged , Prognosis
6.
Evol Bioinform Online ; 7: 149-58, 2011.
Article in English | MEDLINE | ID: mdl-22065497

ABSTRACT

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

7.
Proc Natl Acad Sci U S A ; 108(21): 8605-10, 2011 May 24.
Article in English | MEDLINE | ID: mdl-21551098

ABSTRACT

In many social networks, there is a high correlation between the similarity of actors and the existence of relationships between them. This paper introduces a model of network evolution where actors are assumed to have a small aversion from being connected to others who are dissimilar to themselves, and yet no actor strictly prefers a segregated network. This model is motivated by Schelling's [Schelling TC (1969) Models of segregation. Am Econ Rev 59:488-493] classic model of residential segregation, and we show that Schelling's results also apply to the structure of networks; namely, segregated networks always emerge regardless of the level of aversion. In addition, we prove analytically that attribute similarity among connected network actors always reaches a stationary distribution, and this distribution is independent of network topology and the level of aversion bias. This research provides a basis for more complex models of social interaction that are driven in part by the underlying attributes of network actors and helps advance our understanding of why dysfunctional social network structures may emerge.


Subject(s)
Models, Theoretical , Prejudice , Social Support , Humans , Interpersonal Relations
8.
Article in English | MEDLINE | ID: mdl-26029744

ABSTRACT

In this paper, we introduce a new clustering method: quasi-clique merger, and its associated data pretreatment programs. This program constructs non-binary hierarchical trees with much smaller number of clusters in the outputs. And overlapping clusters are also allowed in the outputs. We applied this new method to cluster 60 human cancer cell lines (the NCI-60) using the previously identified proteomic determinants for chemosensitivity of 5-Fluorouracil (5-FU). All colon cancer cell lines were aggregated into a single cluster, indicating that the eight proteomic markers are potential diagnostic markers of colon cancer. The results based on the new clustering method have surpassed those based on previous methods on the same datasets.

SELECTION OF CITATIONS
SEARCH DETAIL