Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Using clustering of genetic variants in Mendelian randomization to interrogate the causal pathways underlying multimorbidity from a common risk factor.

Liang, Xiaoran; Mounier, Ninon; Apfel, Nicolas; Khalid, Sara; Frayling, Timothy M; Bowden, Jack.

Genet Epidemiol ; 2024 Aug 13.

Artigo em Inglês | MEDLINE | ID: mdl-39138631

RESUMO

Mendelian randomization (MR) is an epidemiological approach that utilizes genetic variants as instrumental variables to estimate the causal effect of an exposure on a health outcome. This paper investigates an MR scenario in which genetic variants aggregate into clusters that identify heterogeneous causal effects. Such variant clusters are likely to emerge if they affect the exposure and outcome via distinct biological pathways. In the multi-outcome MR framework, where a shared exposure causally impacts several disease outcomes simultaneously, these variant clusters can provide insights into the common disease-causing mechanisms underpinning the co-occurrence of multiple long-term conditions, a phenomenon known as multimorbidity. To identify such variant clusters, we adapt the general method of agglomerative hierarchical clustering to multi-sample summary-data MR setup, enabling cluster detection based on variant-specific ratio estimates. Particularly, we tailor the method for multi-outcome MR to aid in elucidating the causal pathways through which a common risk factor contributes to multiple morbidities. We show in simulations that our "MR-AHC" method detects clusters with high accuracy, outperforming the existing methods. We apply the method to investigate the causal effects of high body fat percentage on type 2 diabetes and osteoarthritis, uncovering interconnected cellular processes underlying this multimorbid disease pair.

2.

Metastatic potentials classified with hypoxia-inducible factor 1 downstream genes in pan-cancer cell lines.

Nakamichi, Kazuya; Yamamoto, Yusuke; Semba, Kentaro; Nakayama, Jun.

Genes Cells ; 29(2): 169-177, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38158708

RESUMO

Hypoxia-inducible factor 1 (HIF1) is a transcription factor that is stabilized under hypoxia conditions via post-translational modifications. HIF1 regulates tumor malignancy and metastasis by gene transcriptions, such as Warburg effect and angiogenesis-related genes, in cancer cells. However, the HIF1 downstream genes show varied expressional patterns in different cancer types. Herein, we performed the hierarchical clustering based on the HIF1 downstream gene expression patterns using 1406 cancer cell lines crossing 30 types of cancer to understand the relationship between HIF1 downstream genes and the metastatic potential of cancer cell lines. Two types of cancers, including bone and breast cancers, were classified based on HIF1 downstream genes with significantly altered metastatic potentials. Furthermore, different HIF1 downstream gene subsets were extracted to discriminate each subtype for these cancer types. HIF1 downstream subtyping classification will help to understand the novel insight into tumor malignancy and metastasis in each cancer type.

Assuntos

Neoplasias da Mama , Fator 1 Induzível por Hipóxia , Humanos , Feminino , Fator 1 Induzível por Hipóxia/genética , Fator 1 Induzível por Hipóxia/metabolismo , Linhagem Celular , Neoplasias da Mama/patologia , Subunidade alfa do Fator 1 Induzível por Hipóxia/genética , Subunidade alfa do Fator 1 Induzível por Hipóxia/metabolismo , Linhagem Celular Tumoral , Hipóxia Celular/fisiologia

3.

Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies.

Yeung, Wayland; Zhou, Zhongliang; Mathew, Liju; Gravel, Nathan; Taujale, Rahil; O'Boyle, Brady; Salcedo, Mariah; Venkat, Aarya; Lanzilotta, William; Li, Sheng; Kannan, Natarajan.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36642409

RESUMO

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.

Assuntos

Sequência de Aminoácidos , Proteínas , Análise por Conglomerados , Proteínas/química , Alinhamento de Sequência

4.

Characterizing multimorbidity in ALIVE: comparing single and ensemble clustering methods.

Rudolph, Jacqueline E; Lau, Bryan; Genberg, Becky L; Sun, Jing; Kirk, Gregory D; Mehta, Shruti H.

Am J Epidemiol ; 193(8): 1146-1154, 2024 Aug 05.

Artigo em Inglês | MEDLINE | ID: mdl-38576181

RESUMO

Multimorbidity, defined as having 2 or more chronic conditions, is a growing public health concern, but research in this area is complicated by the fact that multimorbidity is a highly heterogenous outcome. Individuals in a sample may have a differing number and varied combinations of conditions. Clustering methods, such as unsupervised machine learning algorithms, may allow us to tease out the unique multimorbidity phenotypes. However, many clustering methods exist, and choosing which to use is challenging because we do not know the true underlying clusters. Here, we demonstrate the use of 3 individual algorithms (partition around medoids, hierarchical clustering, and probabilistic clustering) and a clustering ensemble approach (which pools different clustering approaches) to identify multimorbidity clusters in the AIDS Linked to the Intravenous Experience cohort study. We show how the clusters can be compared based on cluster quality, interpretability, and predictive ability. In practice, it is critical to compare the clustering results from multiple algorithms and to choose the approach that performs best in the domain(s) that aligns with plans to use the clusters in future analyses.

Assuntos

Algoritmos , Multimorbidade , Humanos , Análise por Conglomerados , Feminino , Masculino , Pessoa de Meia-Idade , Aprendizado de Máquina não Supervisionado , Adulto

5.

Toward accurate diagnosis and surveillance of bacterial infections using enhanced strain-level metagenomic next-generation sequencing of infected body fluids.

Ruan, Zhi; Zou, Shengmei; Wang, Zeyu; Zhang, Luhan; Chen, Hangfei; Wu, Yuye; Jia, Huiqiong; Draz, Mohamed S; Feng, Ye.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35108376

RESUMO

Metagenomic next-generation sequencing (mNGS) enables comprehensive pathogen detection and has become increasingly popular in clinical diagnosis. The distinct pathogenic traits between strains require mNGS to achieve a strain-level resolution, but an equivocal concept of 'strain' as well as the low pathogen loads in most clinical specimens hinders such strain awareness. Here we introduce a metagenomic intra-species typing (MIST) tool (https://github.com/pandafengye/MIST), which hierarchically organizes reference genomes based on average nucleotide identity (ANI) and performs maximum likelihood estimation to infer the strain-level compositional abundance. In silico analysis using synthetic datasets showed that MIST accurately predicted the strain composition at a 99.9% average nucleotide identity (ANI) resolution with a merely 0.001× sequencing depth. When applying MIST on 359 culture-positive and 359 culture-negative real-world specimens of infected body fluids, we found the presence of multiple-strain reached considerable frequencies (30.39%-93.22%), which were otherwise underestimated by current diagnostic techniques due to their limited resolution. Several high-risk clones were identified to be prevalent across samples, including Acinetobacter baumannii sequence type (ST)208/ST195, Staphylococcus aureus ST22/ST398 and Klebsiella pneumoniae ST11/ST15, indicating potential outbreak events occurring in the clinical settings. Interestingly, contaminations caused by the engineered Escherichia coli strain K-12 and BL21 throughout the mNGS datasets were also identified by MIST instead of the statistical decontamination approach. Our study systemically characterized the infected body fluids at the strain level for the first time. Extension of mNGS testing to the strain level can greatly benefit clinical diagnosis of bacterial infections, including the identification of multi-strain infection, decontamination and infection control surveillance.

Assuntos

Infecções Bacterianas , Líquidos Corporais , Infecções Bacterianas/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Metagenômica/métodos , Nucleotídeos

6.

Phenotypes of Polish primary care patients using hierarchical clustering: Exploring the risk of mortality in the LIPIDOGEN2015 study cohort.

Chen, Yang; Gue, Ying; Banach, Maciej; Mikhailidis, Dimitri; Toth, Peter P; Gierlotka, Marek; Osadnik, Tadeusz; Golawski, Marcin; Tomasik, Tomasz; Windak, Adam; Jozwiak, Jacek; Lip, Gregory Y H.

Eur J Clin Invest ; 54(10): e14261, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-38850064

RESUMO

BACKGROUND: Comorbidities in primary care do not occur in isolation but tend to cluster together causing various clinically complex phenotypes. This study aimed to distinguish phenotype clusters and identify the risks of all-cause mortality in primary care. METHODS: The baseline cohort of the LIPIDOGEN2015 sub-study involved 1779 patients recruited by 438 primary care physicians. To identify different phenotype clusters, we used hierarchical clustering and investigated differences between clinical characteristics and mortality between clusters. We then performed causal analyses using causal mediation analysis to explore potential mediators between different clusters and all-cause mortality. RESULTS: A total of 1756 patients were included (mean age 51.2, SD 13.0; 60.3% female), with a median follow-up of 5.7 years. Three clusters were identified: Cluster 1 (n = 543) was characterised by overweight/obesity (body mass index ≥ 25 kg/m2), older (age ≥ 65 years), more comorbidities; Cluster 2 (n = 459) was characterised by non-overweight/obesity, younger, fewer comorbidities; Cluster 3 (n = 754) was characterised by overweight/obesity, younger, fewer comorbidities. Adjusted Cox regression showed that compared with Cluster 2, Cluster 1 had a significantly higher risk of all-cause mortality (HR 3.87, 95% CI: 1.24-15.91), whereas this was insignificantly different for Cluster 3. Causal mediation analyses showed that decreased protein thiol groups mediated the hazard effect of all-cause mortality in Cluster 1 compared with Cluster 2, but not between Clusters 1 and 3. CONCLUSION: Overweight/obesity older patients with more comorbidities had the highest risk of long-term all-cause mortality, and in the young group population overweight/obesity insignificantly increased the risk in the long-term follow-up, providing a basis for stratified phenotypic risk management.

Assuntos

Comorbidade , Fenótipo , Atenção Primária à Saúde , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Atenção Primária à Saúde/estatística & dados numéricos , Análise por Conglomerados , Adulto , Polônia/epidemiologia , Obesidade/epidemiologia , Mortalidade , Modelos de Riscos Proporcionais , Sobrepeso/epidemiologia , Estudos de Coortes , Fatores Etários , Causas de Morte , Índice de Massa Corporal , Fatores de Risco , Hipertensão/epidemiologia

7.

Exploring the use of immunomethylomics in the characterization of depressed patients: A proof-of-concept study.

Van Assche, Evelien; Hohoff, Christa; Su Atil, Ecem; Wissing, Sophia M; Serretti, Alessandro; Fabbri, Chiara; Pisanu, Claudia; Squassina, Alessio; Minelli, Alessandra; Baune, Bernhard T.

Brain Behav Immun ; 123: 597-605, 2024 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-39341467

RESUMO

Alterations in DNA methylation and inflammation could represent valid biomarkers for the stratification of patients with major depressive disorder (MDD). This study explored the use of DNA-methylation based immunological cell-type profiles in the context of MDD and symptom severity over time. In 119 individuals with MDD, DNA-methylation was assessed on whole blood using the Illumina Infinium MethylationEPIC 850 k BeadChip. Quality control and data processing, as well as cell type estimation was conducted using the RnBeads package. The cell type composition was estimated using epigenome-wide DNA methylation signatures, applying the Houseman method, considering six cell types (neutrophils, natural killer cells (NK), B cells, CD4+ T cells, CD8+ T cells and monocytes). Two cytokines (IL-6 and IL-1ß) and hsCRP were quantified in serum. We performed a hierarchical cluster analysis on the six estimated cell-types and tested the differences between these clusters in relation to the two cytokines and hsCRP, depression severity at baseline, and after 6 weeks of treatment (celecoxib/placebo + vortioxetine). We performed a second cluster analysis with cell-types and cytokines combined. ANCOVA was used to test for differences across clusters. We applied the Bonferroni correction. After quality control, we included 113 participants. Two clusters were identified, cluster 1 was high in CD4+ cells and NK, cluster 2 was high in CD8+ T-cells and B-cells, with similar fractions of neutrophils and monocytes. The clusters were not associated with either of the two cytokines and hsCRP, or depression severity at baseline, but cluster 1 showed higher depression severity after 6 weeks, corrected for baseline (p = 0.0060). The second cluster analysis found similar results: cluster 1 was low in CD8+ T-cells, B-cells, and IL-1ß. Cluster 2 was low in CD4+ cells and natural killer cells. Neutrophils, monocytes, IL-6 and hsCRP were not different between the clusters. Participants in cluster 1 showed higher depression severity at baseline than cluster 2 (p = 0.034), but no difference in depression severity after 6 weeks. DNA-methylation based cell-type profiles may be valuable in the immunological characterization and stratification of patients with MDD. Future models should consider the inclusion of more cell-types and cytokines for better a prediction of treatment outcomes.

8.

Characterization of Pleural Mesothelioma by Hierarchical Clustering Analyses Using Immune Cells within Tumor Microenvironment.

Inaguma, Shingo; Wang, Chengbo; Ito, Sunao; Ueki, Akane; Lasota, Jerzy; Czapiewski, Piotr; Langfort, Renata; Rys, Janusz; Szpor, Joanna; Waloszczyk, Piotr; Okon, Krzysztof; Biernat, Wojciech; Takiguchi, Shuji; Schrump, David S; Miettinen, Markku; Takahashi, Satoru.

Pathobiology ; 91(5): 313-325, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38527431

RESUMO

INTRODUCTION: Over the past decade, classifications using immune cell infiltration have been applied to many types of tumors; however, mesotheliomas have been less frequently evaluated. METHODS: In this study, 60 well-characterized pleural mesotheliomas (PMs) were evaluated immunohistochemically for the characteristics of immune cells within tumor microenvironment (TME) using 10 immunohistochemical markers: CD3, CD4, CD8, CD56, CD68, CD163, FOXP3, CD27, PD-1, and TIM-3. For further characterization of PMs, hierarchical clustering analyses using these 10 markers were performed. RESULTS: Among the immune cell markers, CD3 (p < 0.0001), CD4 (p = 0.0016), CD8 (p = 0.00094), CD163+ (p = 0.042), and FOXP3+ (p = 0.025) were significantly associated with an unfavorable clinical outcome. Immune checkpoint receptor expressions on tumor-infiltrating lymphocytes such as PD-1 (p = 0.050), CD27 (p = 0.014), and TIM-3 (p = 0.0098) were also associated with unfavorable survival. Hierarchical clustering analyses identified three groups showing specific characteristics and significant associations with patient survival (p = 0.016): the highest number of immune cells (ICHigh); the lowest number of immune cells, especially CD8+ and CD163+ cells (ICLow); and intermediate number of immune cells (ICInt). ICHigh tumors showed significantly higher expression of PD-L1 (p = 0.00038). Cox proportional hazard model identified ICHigh [hazard ratio (HR) = 2.90] and ICInt (HR = 2.97) as potential risk factors compared with ICLow. Tumor CD47 (HR = 2.36), tumor CD70 (HR = 3.04), and tumor PD-L1 (HR = 3.21) expressions were also identified as potential risk factors for PM patients. CONCLUSION: Our findings indicate immune checkpoint and/or immune cell-targeting therapies against CD70-CD27 and/or CD47-SIRPA axes may be applied for PM patients in combination with PD-L1-PD-1 targeting therapies in accordance with their tumor immune microenvironment characteristics.

Assuntos

Biomarcadores Tumorais , Linfócitos do Interstício Tumoral , Neoplasias Pleurais , Microambiente Tumoral , Humanos , Microambiente Tumoral/imunologia , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Análise por Conglomerados , Neoplasias Pleurais/imunologia , Neoplasias Pleurais/patologia , Linfócitos do Interstício Tumoral/imunologia , Mesotelioma/imunologia , Mesotelioma/patologia , Adulto , Mesotelioma Maligno/imunologia , Mesotelioma Maligno/patologia , Idoso de 80 Anos ou mais , Prognóstico , Imuno-Histoquímica

9.

Quantifying the Light-Absorption Properties and Molecular Composition of Brown Carbon Aerosol from Sub-Saharan African Biomass Combustion.

Moschos, Vaios; Christensen, Cade; Mouton, Megan; Fiddler, Marc N; Isolabella, Tommaso; Mazzei, Federico; Massabò, Dario; Turpin, Barbara J; Bililign, Solomon; Surratt, Jason D.

Environ Sci Technol ; 58(9): 4268-4280, 2024 Mar 05.

Artigo em Inglês | MEDLINE | ID: mdl-38393751

RESUMO

Sub-Saharan Africa is a hotspot for biomass burning (BB)-derived carbonaceous aerosols, including light-absorbing organic (brown) carbon (BrC). However, the chemically complex nature of BrC in BB aerosols from this region is not fully understood. We generated smoke in a chamber through smoldering combustion of common sub-Saharan African biomass fuels (hardwoods, cow dung, savanna grass, and leaves). We quantified aethalometer-based, real-time light-absorption properties of BrC-containing organic-rich BB aerosols, accounting for variations in wavelength, fuel type, relative humidity, and photochemical aging conditions. In filter samples collected from the chamber and Botswana in the winter, we identified 182 BrC species, classified into lignin pyrolysis products, nitroaromatics, coumarins, stilbenes, and flavonoids. Using an extensive set of standards, we determined species-specific mass and emission factors. Our analysis revealed a linear relationship between the combined BrC species contribution to chamber-measured BB aerosol mass (0.4-14%) and the mass-absorption cross-section at 370 nm (0.2-2.2 m2 g-1). Hierarchical clustering resolved key molecular-level components from the BrC matrix, with photochemically aged emissions from leaf and cow-dung burning showing BrC fingerprints similar to those found in Botswana aerosols. These quantitative findings could potentially help refine climate model predictions, aid in source apportionment, and inform effective air quality management policies for human health and the global climate.

Assuntos

Poluentes Atmosféricos , Poluição do Ar , Humanos , Idoso , Carbono , Biomassa , Monitoramento Ambiental , Poluição do Ar/análise , Aerossóis/análise , Poluentes Atmosféricos/análise , Material Particulado/análise

10.

Characterization of colorectal cancer by hierarchical clustering analyses of five immune cell markers.

Ito, Sunao; Koshino, Akira; Komura, Masayuki; Kato, Shunsuke; Otani, Takahiro; Wang, Chengbo; Ueki, Akane; Takahashi, Hiroki; Ebi, Masahide; Ogasawara, Naotaka; Tsuzuki, Toyonori; Kasai, Kenji; Kasugai, Kunio; Takiguchi, Shuji; Takahashi, Satoru; Inaguma, Shingo.

Pathol Int ; 74(1): 13-25, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38050808

RESUMO

The present study analyzed the expression of five independent immunohistochemical markers, CD4, CD8, CD66b, CD68, and CD163, on immune cells within the colorectal cancer (CRC) tumor microenvironment (TME). Using hierarchical clustering, patients were successfully classified according to significant associations with clinicopathological features and/or survival. Patients with mismatch repair-proficient (pMMR) CRC were categorized into four groups with survival differences (p = 0.0084): CD4Low , CD4High , MΦHigh , and CD8Low . MΦHigh tumors showed significantly higher expression of CD47 (p < 0.0001), a phagocytosis checkpoint molecule. These tumors contained significantly greater numbers of PD-1+ (p < 0.0001), TIM-3+ (p < 0.0001), and SIRPA+ (p < 0.0001) immune cells. Notably, 10% of the patients with pMMR CRC expressed PD-L1 (CD274) on tumor cells with significantly worse survival (p = 0.00064). The Cox proportional hazards model identified MΦ High (hazard ratio [HR] = 2.02, 95%, p = 0.032), CD8Low (HR = 2.45, p = 0.011), and tumor PD-L1 expression (HR = 2.74, p = 0.0061) as potential risk factors. PD-L1-PD-1 and/or CD47-SIRPA axes targeting immune checkpoint therapies might be considered for patients with pMMR CRC according to their tumor cells and tumor immune microenvironment characteristics.

Assuntos

Neoplasias Colorretais , Humanos , Neoplasias Colorretais/patologia , Antígeno CD47 , Antígeno B7-H1/metabolismo , Receptor de Morte Celular Programada 1/metabolismo , Biomarcadores Tumorais/análise , Microambiente Tumoral

11.

Adaptively leverage multiple real-world data sources for treatment effect estimation based on similarity.

Long, Meihua; Song, Jiali; Rong, Zhiwei; Mi, Lan; Song, Yuqin; Hou, Yan.

J Biopharm Stat ; : 1-11, 2024 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-38557411

RESUMO

The incorporation of real-world data (RWD) into medical product development and evaluation has exhibited consistent growth. However, there is no universally adopted method of how much information to borrow from external data. This paper proposes a study design methodology called Tree-based Monte Carlo (TMC) that dynamically integrates patients from various RWD sources to calculate the treatment effect based on the similarity between clinical trial and RWD. Initially, a propensity score is developed to gauge the resemblance between clinical trial data and each real-world dataset. Utilizing this similarity metric, we construct a hierarchical clustering tree that delineates varying degrees of similarity between each RWD source and the clinical trial data. Ultimately, a Gaussian process methodology is employed across this hierarchical clustering framework to synthesize the projected treatment effects of the external group. Simulation result shows that our clustering tree could successfully identify similarity. Data sources exhibiting greater similarity with clinical trial are accorded higher weights in treatment estimation process, while less congruent sources receive comparatively lower emphasis. Compared with another Bayesian method, meta-analytic predictive prior (MAP), our proposed method's estimator is closer to the true value and has smaller bias.

12.

Clustering plasma concentration-time curves: applications of unsupervised learning in pharmacogenomics.

Lautier, Jackson P; Grosser, Stella; Kim, Jessica; Kim, Hyewon; Kim, Junghi.

J Biopharm Stat ; : 1-19, 2024 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-38888431

RESUMO

Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.

13.

Tracking and Profiling Repeated Users Over Time in Text-Based Counseling: Longitudinal Observational Study With Hierarchical Clustering.

Xu, Yucan; Chan, Christian Shaunlyn; Chan, Evangeline; Chen, Junyou; Cheung, Florence; Xu, Zhongzhi; Liu, Joyce; Yip, Paul Siu Fai.

J Med Internet Res ; 26: e50976, 2024 May 30.

Artigo em Inglês | MEDLINE | ID: mdl-38815258

RESUMO

BACKGROUND: Due to their accessibility and anonymity, web-based counseling services are expanding at an unprecedented rate. One of the most prominent challenges such services face is repeated users, who represent a small fraction of total users but consume significant resources by continually returning to the system and reiterating the same narrative and issues. A deeper understanding of repeated users and tailoring interventions may help improve service efficiency and effectiveness. Previous studies on repeated users were mainly on telephone counseling, and the classification of repeated users tended to be arbitrary and failed to capture the heterogeneity in this group of users. OBJECTIVE: In this study, we aimed to develop a systematic method to profile repeated users and to understand what drives their use of the service. By doing so, we aimed to provide insight and practical implications that can inform the provision of service catering to different types of users and improve service effectiveness. METHODS: We extracted session data from 29,400 users from a free 24/7 web-based counseling service from 2018 to 2021. To systematically investigate the heterogeneity of repeated users, hierarchical clustering was used to classify the users based on 3 indicators of service use behaviors, including the duration of their user journey, use frequency, and intensity. We then compared the psychological profile of the identified subgroups including their suicide risks and primary concerns to gain insights into the factors driving their patterns of service use. RESULTS: Three clusters of repeated users with clear psychological profiles were detected: episodic, intermittent, and persistent-intensive users. Generally, compared with one-time users, repeated users showed higher suicide risks and more complicated backgrounds, including more severe presenting issues such as suicide or self-harm, bullying, and addictive behaviors. Higher frequency and intensity of service use were also associated with elevated suicide risk levels and a higher proportion of users citing mental disorders as their primary concerns. CONCLUSIONS: This study presents a systematic method of identifying and classifying repeated users in web-based counseling services. The proposed bottom-up clustering method identified 3 subgroups of repeated users with distinct service behaviors and psychological profiles. The findings can facilitate frontline personnel in delivering more efficient interventions and the proposed method can also be meaningful to a wider range of services in improving service provision, resource allocation, and service effectiveness.

Assuntos

Aconselhamento , Humanos , Estudos Longitudinais , Análise por Conglomerados , Feminino , Adulto , Masculino , Aconselhamento/métodos , Aconselhamento/estatística & dados numéricos , Pessoa de Meia-Idade , Envio de Mensagens de Texto/estatística & dados numéricos , Adulto Jovem

14.

Improving the Walktrap Algorithm Using K-Means Clustering.

Brusco, Michael; Steinley, Douglas; Watts, Ashley L.

Multivariate Behav Res ; 59(2): 266-288, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38361218

RESUMO

The walktrap algorithm is one of the most popular community-detection methods in psychological research. Several simulation studies have shown that it is often effective at determining the correct number of communities and assigning items to their proper community. Nevertheless, it is important to recognize that the walktrap algorithm relies on hierarchical clustering because it was originally developed for networks much larger than those encountered in psychological research. In this paper, we present and demonstrate a computational alternative to the hierarchical algorithm that is conceptually easier to understand. More importantly, we show that better solutions to the sum-of-squares optimization problem that is heuristically tackled by hierarchical clustering in the walktrap algorithm can often be obtained using exact or approximate methods for K-means clustering. Three simulation studies and analyses of empirical networks were completed to assess the impact of better sum-of-squares solutions.

Assuntos

Algoritmos , Simulação por Computador , Análise por Conglomerados

15.

Efficient Image Retrieval Using Hierarchical K-Means Clustering.

Park, Dayoung; Hwang, Youngbae.

Sensors (Basel) ; 24(8)2024 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-38676020

RESUMO

The objective of content-based image retrieval (CBIR) is to locate samples from a database that are akin to a query, relying on the content embedded within the images. A contemporary strategy involves calculating the similarity between compact vectors by encoding both the query and the database images as global descriptors. In this work, we propose an image retrieval method by using hierarchical K-means clustering to efficiently organize the image descriptors within the database, which aims to optimize the subsequent retrieval process. Then, we compute the similarity between the descriptor set within the leaf nodes and the query descriptor to rank them accordingly. Three tree search algorithms are presented to enable a trade-off between search accuracy and speed that allows for substantial gains at the expense of a slightly reduced retrieval accuracy. Our proposed method demonstrates enhancement in image retrieval speed when applied to the CLIP-based model, UNICOM, designed for category-level retrieval, as well as the CNN-based R-GeM model, tailored for particular object retrieval by validating its effectiveness across various domains and backbones. We achieve an 18-times speed improvement while preserving over 99% accuracy when applied to the In-Shop dataset, the largest dataset in the experiments.

16.

Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic Encryption.

Sokhonn, Lynin; Park, Yun-Soo; Lee, Mun-Kyu.

Sensors (Basel) ; 24(15)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39123872

RESUMO

Hierarchical clustering is a widely used data analysis technique. Typically, tools for this method operate on data in its original, readable form, raising privacy concerns when a clustering task involving sensitive data that must remain confidential is outsourced to an external server. To address this issue, we developed a method that integrates Cheon-Kim-Kim-Song homomorphic encryption (HE), allowing the clustering process to be performed without revealing the raw data. In hierarchical clustering, the two nearest clusters are repeatedly merged until the desired number of clusters is reached. The proximity of clusters is evaluated using various metrics. In this study, we considered two well-known metrics: single linkage and complete linkage. Applying HE to these methods involves sorting encrypted distances, which is a resource-intensive operation. Therefore, we propose a cooperative approach in which the data owner aids the sorting process and shares a list of data positions with a computation server. Using this list, the server can determine the clustering of the data points. The proposed approach ensures secure hierarchical clustering using single and complete linkage methods without exposing the original data.

17.

A "Prediction - Detection - Judgment" framework for sudden water contamination event detection with online monitoring.

Liao, Zhenliang; Zhang, Minhao; Chen, Yun; Zhang, Zhiyu; Wang, Huijuan.

J Environ Manage ; 355: 120496, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38437742

RESUMO

The contamination detection technology helps in water quality management and protection in surface water. It is important to detect sudden contamination events timely from dynamic variations due to various interference factors in online water quality monitoring data. In this study, a framework named "Prediction - Detection - Judgment" is proposed with a method framework of "Time series increment - Hierarchical clustering - Bayes' theorem model". Time to detection is used as an evaluation index of contamination detection methods, along with the probability of detection and false alarm rate. The proposed method is tested with available public data and further applied in a monitoring site of a river. Results showed that the method could detect the contamination events with a 100% probability of detection, a 17% false alarm rate and a time to detection close to 4 monitoring intervals. The proposed index time to detection evaluates the timeliness of the method, and timely detection ensures that contamination events can be responded to and dealt with in time. The site application also demonstrates the feasibility and practicability of the framework proposed in this study and its potential for extensive implementation.

Assuntos

Julgamento , Abastecimento de Água , Teorema de Bayes , Qualidade da Água , Poluição da Água

18.

Quantifying multi-dimensional services of water ecosystems and breakpoint-based spatial radiation of typical regulating services considering the hierarchical clustering-based classification.

Guan, Xinjian; Xu, Yingjun; Meng, Yu; Xu, Wenjing; Yan, Denghua.

J Environ Manage ; 351: 119852, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38159309

RESUMO

This study proposes a set of water ecosystem services (WES) research system, including classification, benefit quantification and spatial radiation effect, with the goal of promoting harmonious coexistence between humans and nature, as well as providing a theoretical foundation for optimizing water resources management. Hierarchical cluster analysis was applied to categorize WES taking in to account the four nature constraints of product nature, energy flow relationships, circularity, and human social utility. A multi-dimensional benefit quantification methodology system for WES was constructed by combining the emergy theory with multidisciplinary methods of ecology, economics, and sociology. Based on the theories of spatial autocorrelation and breaking point, we investigated the spatial radiation effects of typical services in the cyclic regulation category. The proposed methodology has been applied to Luoyang, China. The results show that the Resource Provisioning (RP) and Cultural Addition (CA) services change greatly over time, and drive the overall WES to increase and then decrease. The spatial and temporal distribution of water resources is uneven, with WES being slightly better in the southern region than the northern region. Additionally, spatial radiation effects of typical regulating services are most prominent in S County. This finding suggests the establishment of scientific and rational intra-basin or inter-basin water management systems to expand the beneficial impacts of water-rich areas on neighboring regions.

Assuntos

Conservação dos Recursos Naturais , Ecossistema , Humanos , Análise Espacial , Ecologia , China

19.

Multivariate Analysis of Solubility Parameters for Drug-Polymer Miscibility Assessment in Preparing Raloxifene Hydrochloride Amorphous Solid Dispersions.

Moreira, Guilherme G; Taveira, Stephânia F; Martins, Felipe T; Wagner, Karl G; Marreto, Ricardo N.

AAPS PharmSciTech ; 25(5): 127, 2024 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-38844724

RESUMO

The success of obtaining solid dispersions for solubility improvement invariably depends on the miscibility of the drug and polymeric carriers. This study aimed to categorize and select polymeric carriers via the classical group contribution method using the multivariate analysis of the calculated solubility parameter of RX-HCl. The total, partial, and derivate parameters for RX-HCl were calculated. The data were compared with the results of excipients (N = 36), and a hierarchical clustering analysis was further performed. Solid dispersions of selected polymers in different drug loads were produced using solvent casting and characterized via X-ray diffraction, infrared spectroscopy and scanning electron microscopy. RX-HCl presented a Hansen solubility parameter (HSP) of 23.52 MPa1/2. The exploratory analysis of HSP and relative energy difference (RED) elicited a classification for miscible (n = 11), partially miscible (n = 15), and immiscible (n = 10) combinations. The experimental validation followed by a principal component regression exhibited a significant correlation between the crystallinity reduction and calculated parameters, whereas the spectroscopic evaluation highlighted the hydrogen-bonding contribution towards amorphization. The systematic approach presented a high discrimination ability, contributing to optimal excipient selection for the obtention of solid solutions of RX-HCl.

Assuntos

Química Farmacêutica , Excipientes , Polímeros , Cloridrato de Raloxifeno , Solubilidade , Difração de Raios X , Polímeros/química , Excipientes/química , Cloridrato de Raloxifeno/química , Análise Multivariada , Difração de Raios X/métodos , Química Farmacêutica/métodos , Portadores de Fármacos/química , Composição de Medicamentos/métodos , Microscopia Eletrônica de Varredura/métodos , Ligação de Hidrogênio , Cristalização/métodos

20.

Exploring spatial and seasonal water quality variations in Kelani River, Sri Lanka: a latent variable approach.

Wijayaweera, Nalintha; Gunawardhana, Luminda Niroshana; Kazama, So; Rajapakse, Lalith; Patabendige, Chaminda Samarasuriya; Karunaweera, Himali.

Environ Monit Assess ; 196(11): 1063, 2024 Oct 17.

Artigo em Inglês | MEDLINE | ID: mdl-39417920

RESUMO

Water quality degradation poses a significant challenge globally, especially in developing nations like Sri Lanka. Extensive monitoring programs designed to address escalating river pollution collect multiple water quality parameters over extended periods and varied locations. However, the sheer volume of data can be overwhelming, making it difficult to process effectively and interpret accurately using conventional methods. In this study, latent variable (LV) and unsupervised machine learning techniques were used to investigate spatial and seasonal variations of surface water quality for 17 parameters across 17 locations along the Kelani River, Sri Lanka, using monthly water quality parameters from 2016 to 2020. Pearson's correlation matrix identified 10 parameters significantly affecting water quality variations and factor analysis (FA) generated five LVs, accounting for 77% of the total variance in the dataset. The identified LVs showed multiple methods of river pollution. Hierarchical clustering analysis and self-organizing mapping methods clustered stations in a closely analogous manner. Stations near industrial zones and the river mouth showed higher water quality variance, often exceeding national guidelines. Correlation testing revealed strong relationships between water quality and catchment hydrometeorological variations during monsoonal seasons. Spatial analyses showed increased LV variance in the Lower Kelani River Basin, indicating higher pollutant levels in different seasons. Industrial effluents (LV-2 and LV-4) and domestic and municipal sewage (LV-3 and LV-5) exhibit greater seasonal fluctuations. The results showed that the proposed LV approach has the potential to assist authorities in addressing water pollution amidst the complexity of multiple water quality parameters.

Assuntos

Monitoramento Ambiental , Rios , Estações do Ano , Poluentes Químicos da Água , Qualidade da Água , Sri Lanka , Monitoramento Ambiental/métodos , Rios/química , Poluentes Químicos da Água/análise , Análise Espacial , Poluição Química da Água/estatística & dados numéricos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA