Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters











Publication year range
1.
PLoS One ; 18(1): e0280910, 2023.
Article in English | MEDLINE | ID: mdl-36689443

ABSTRACT

This paper presents a network science approach to investigate a health information dataset, the Sexual Acquisition and Transmission of HIV Cooperative Agreement Program (SATHCAP), to uncover hidden relationships that can be used to suggest targeted health interventions. From the data, four key target variables are chosen: HIV status, injecting drug use, homelessness, and insurance status. These target variables are converted to a graph format using four separate graph inference techniques: graphical lasso, Meinshausen Bühlmann (MB), k-Nearest Neighbors (kNN), and correlation thresholding (CT). The graphs are then clustered using four clustering methods: Louvain, Leiden, and NBR-Clust with VAT and integrity. Promising clusters are chosen using internal evaluation measures and are visualized and analyzed to identify marker attributes and key relationships. The kNN and CT inference methods are shown to give useful results when combined with NBR-Clust clustering. Examples of cluster analysis indicate that the methodology produces results that will be relevant to the public health community.


Subject(s)
Algorithms , HIV Infections , Humans , Cluster Analysis , Machine Learning
2.
Front Hum Neurosci ; 16: 960991, 2022.
Article in English | MEDLINE | ID: mdl-36310845

ABSTRACT

Autism Spectrum Disorder (ASD) is extremely heterogeneous clinically and genetically. There is a pressing need for a better understanding of the heterogeneity of ASD based on scientifically rigorous approaches centered on systematic evaluation of the clinical and research utility of both phenotype and genotype markers. This paper presents a holistic PheWAS-inspired method to identify meaningful associations between ASD phenotypes and genotypes. We generate two types of phenotype-phenotype (p-p) graphs: a direct graph that utilizes only phenotype data, and an indirect graph that incorporates genotype as well as phenotype data. We introduce a novel methodology for fusing the direct and indirect p-p networks in which the genotype data is incorporated into the phenotype data in varying degrees. The hypothesis is that the heterogeneity of ASD can be distinguished by clustering the p-p graph. The obtained graphs are clustered using network-oriented clustering techniques, and results are evaluated. The most promising clusterings are subsequently analyzed for biological and domain-based relevance. Clusters obtained delineated different aspects of ASD, including differentiating ASD-specific symptoms, cognitive, adaptive, language and communication functions, and behavioral problems. Some of the important genes associated with the clusters have previous known associations to ASD. We found that clusters based on integrated genetic and phenotype data were more effective at identifying relevant genes than clusters constructed from phenotype information alone. These genes included five with suggestive evidence of ASD association and one known to be a strong candidate.

3.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 1774-1780, 2021 11.
Article in English | MEDLINE | ID: mdl-34891631

ABSTRACT

This paper presents a method for estimating the overall size of a hidden population using results from a respondent driven sampling (RDS) survey. We use data from the Latino MSM Community Involvement survey (LMSM-CI), an RDS dataset that contains information collected regarding the Latino MSM communities in Chicago and San Francisco. A novel model is developed in which data collected in the LMSM-CI survey serves as a bridge for use of data from other sources. In particular, American Community Survey Same-Sex Householder data along with UCLA's Williams Institute data on LGBT population by county are combined with current living situation data taken from the LMSM-CI dataset. Results obtained from these sources are used as the prior distribution for Successive-Sampling Population Size Estimation (SS-PSE) - a method used to create a probability distribution over population sizes. The strength of our model is that it does not rely on estimates of community size taken during an RDS survey, which are prone to inaccuracies and not useful in other contexts. It allows unambiguous, useful data (such as living situation), to be used to estimate population sizes.


Subject(s)
HIV Infections , Hispanic or Latino , Sexual and Gender Minorities , Censuses , HIV Infections/epidemiology , HIV Infections/ethnology , Hispanic or Latino/statistics & numerical data , Homosexuality, Male , Humans , Male , Sexual and Gender Minorities/statistics & numerical data , Surveys and Questionnaires
4.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 1781-1786, 2021 11.
Article in English | MEDLINE | ID: mdl-34891632

ABSTRACT

Respondent-driven sampling (RDS) is a popular method for surveying hidden populations based on friendships and existing social network connections. In such a survey the underlying hidden network remains largely unknown. However, it is useful to estimate its size as well as the relative proportions of surveyed features. The fact that linked network participants are likely to share common features is called homophily, and is an important property in understanding the topology of social networks. In this paper we present a methodology that scales up RDS data to model the underlying hidden population in a way that preserves multiple homophilies among different features. We test our model using 46 features of the population sampled by the SATHCAP RDS survey. Our network generation methodology successfully preserves the homophilic associations in a randomly generated Barabasi-Albert network. Having created a realistic model of the expanded SATHCAP network, we test our model by simulating RDS surveys over it, and comparing the resulting sub-networks with SATHCAP. In our generated network, we preserve 85% of homophilies to under 2% error. In our simulated RDS surveys we preserve 85% of homophilies to under 15% error.


Subject(s)
Surveys and Questionnaires , Humans
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2110-2114, 2021 11.
Article in English | MEDLINE | ID: mdl-34891705

ABSTRACT

Children with Autism Spectrum Disorder (ASD) exhibit a wide diversity in type, number, and severity of social deficits as well as communicative and cognitive difficulties. It is a challenge to categorize the phenotypes of a particular ASD patient with their unique genetic variants. There is a need for a better understanding of the connections between genotype information and the phenotypes to sort out the heterogeneity of ASD. In this study, single nucleotide polymorphism (SNP) and phenotype data obtained from a simplex ASD sample are combined using a PheWAS-inspired approach to construct a phenotype-phenotype network. The network is clustered, yielding groups of etiologically related phenotypes. These clusters are analyzed to identify relevant genes associated with each set of phenotypes. The results identified multiple discriminant SNPs associated with varied phenotype clusters such as ASD aberrant behavior (self-injury, compulsiveness and hyperactivity), as well as IQ and language skills. Overall, these SNPs were linked to 22 significant genes. An extensive literature search revealed that eight of these are known to have strong evidence of association with ASD. The others have been linked to related disorders such as mental conditions, cognition, and social functioning.Clinical relevance- This study further informs on connections between certain groups of ASD phenotypes and their unique genetic variants. Such insight regarding the heterogeneity of ASD would support clinicians to advance more tailored interventions and improve outcomes for ASD patients.


Subject(s)
Autism Spectrum Disorder , Autism Spectrum Disorder/genetics , Cognition , Humans , Phenotype , Polymorphism, Single Nucleotide
6.
PLoS One ; 16(8): e0256601, 2021.
Article in English | MEDLINE | ID: mdl-34428228

ABSTRACT

Networks science techniques are frequently used to provide meaningful insights into the populations underlying medical and social data. This paper examines SATHCAP, a dataset related to HIV and drug use in three US cities. In particular, we use network measures such as betweenness centrality, closeness centrality, and eigenvector centrality to find central, important nodes in a network derived from SATHCAP data. We evaluate the attributes of these important nodes and create an exceptionality score based on the number of nodes that share a particular attribute. This score, along with the underlying network itself, is used to reveal insight into the attributes of groups that can be effectively targeted to slow the spread of disease. Our research confirms a known connection between homelessness and HIV, as well as drug abuse and HIV, and shows support for the theory that individuals without easy access to transportation are more likely to be central to the spread of HIV in urban, high risk populations.


Subject(s)
Social Network Analysis , Cities , Databases, Factual , HIV Infections/pathology , HIV Infections/transmission , Ill-Housed Persons , Humans , Substance-Related Disorders/pathology
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 5602-5605, 2020 07.
Article in English | MEDLINE | ID: mdl-33019247

ABSTRACT

Feature selection provides a useful method for reducing the size of large data sets while maintaining integrity, thereby improving the accuracy of neural networks and other classifiers. However, running multiple feature selection models and their accompanying classifiers can make interpreting results difficult. To this end, we present a data-driven methodology called Meta-Best that not only returns a single feature set related to a classification target, but also returns an optimal size and ranks the features by importance within the set. This proposed methodology is tested on six distinct targets from the well-known REGARDS dataset: Deceased, Self-Reported Diabetes, Light Alcohol Abuse Risk, Regular NSAID Use, Current Smoker, and Self-Reported Stroke. This methodology is shown to improve the classification rate of neural networks by 0.056 using the ROC Area Under Curve metric compared to a control test with no feature selection.


Subject(s)
Algorithms , Neural Networks, Computer
8.
IEEE J Biomed Health Inform ; 24(11): 3136-3143, 2020 11.
Article in English | MEDLINE | ID: mdl-32749973

ABSTRACT

Performing network-based analysis on medical and biological data makes a wide variety of machine learning tools available. Clustering, which can be used for classification, presents opportunities for identifying hard-to-reach groups for the development of customized health interventions. Due to a desire to convert abundant DNA gene co-expression data into networks, many graph inference methods have been developed. Likewise there are many clustering and classification tools. This paper presents a comparison of techniques for graph inference and clustering, using different numbers of features, in order to select the best tuple of graph inference method, clustering method, and number of features according to a particular phenotype. An extensive machine learning based analysis of the REGARDS dataset is conducted, evaluating the CoNet and K-Nearest Neighbors (KNN) network inference methods, along with the Louvain, Leiden and NBR-Clust clustering techniques. Results from analysis involving five internal cluster evaluation indices show the traditional KNN inference method and NBR-Clust and Louvain clustering produce the most promising clusters with medical phenotype data. It is also shown that visualization can aid in interpreting the clusters, and that the clusters produced can identify meaningful groups indicating customized interventions.


Subject(s)
Algorithms , Gene Expression Profiling , Cluster Analysis , Machine Learning
9.
PLoS One ; 14(11): e0225382, 2019.
Article in English | MEDLINE | ID: mdl-31756219

ABSTRACT

Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.


Subject(s)
Biomarkers/analysis , Inflammatory Bowel Diseases/microbiology , Metagenomics/methods , Algorithms , Humans , Inflammatory Bowel Diseases/metabolism , Machine Learning , Models, Theoretical
10.
Appl Netw Sci ; 3(1): 38, 2018.
Article in English | MEDLINE | ID: mdl-30839816

ABSTRACT

With the growing ubiquity of data in network form, clustering in the context of a network, represented as a graph, has become increasingly important. Clustering is a very useful data exploratory machine learning tool that allows us to make better sense of heterogeneous data by grouping data with similar attributes based on some criteria. This paper investigates the application of a novel graph theoretic clustering method, Node-Based Resilience clustering (NBR-Clust), to address the heterogeneity of Autism Spectrum Disorder (ASD) and identify meaningful subgroups. The hypothesis is that analysis of these subgroups would reveal relevant biomarkers that would provide a better understanding of ASD phenotypic heterogeneity useful for further ASD studies. We address appropriate graph constructions suited for representing the ASD phenotype data. The sample population is drawn from a very large rigorous dataset: Simons Simplex Collection (SSC). Analysis of the results performed using graph quality measures, internal cluster validation measures, and clinical analysis outcome demonstrate the potential usefulness of resilience measure clustering for biomedical datasets. We also conduct feature extraction analysis to characterize relevant biomarkers that delineate the resulting subgroups. The optimal results obtained favored predominantly a 5-cluster configuration.

SELECTION OF CITATIONS
SEARCH DETAIL