|

1.

The public attitude towards ChatGPT on reddit: A study based on unsupervised learning from sentiment analysis and topic modeling.

Xu, Zhaoxiang; Fang, Qingguo; Huang, Yanbo; Xie, Mingjian.

PLoS One ; 19(5): e0302502, 2024.

Article En | MEDLINE | ID: mdl-38743773

ChatGPT has demonstrated impressive abilities and impacted various aspects of human society since its creation, gaining widespread attention from different social spheres. This study aims to comprehensively assess public perception of ChatGPT on Reddit. The dataset was collected via Reddit, a social media platform, and includes 23,733 posts and comments related to ChatGPT. Firstly, to examine public attitudes, this study conducts content analysis utilizing topic modeling with the Latent Dirichlet Allocation (LDA) algorithm to extract pertinent topics. Furthermore, sentiment analysis categorizes user posts and comments as positive, negative, or neutral using Textblob and Vader in natural language processing. The result of topic modeling shows that seven topics regarding ChatGPT are identified, which can be grouped into three themes: user perception, technical methods, and impacts on society. Results from the sentiment analysis show that 61.6% of the posts and comments hold favorable opinions on ChatGPT. They emphasize ChatGPT's ability to prompt and engage in natural conversations with users, without relying on complex natural language processing. It provides suggestions for ChatGPT developers to enhance its usability design and functionality. Meanwhile, stakeholders, including users, should comprehend the advantages and disadvantages of ChatGPT in human society to promote ethical and regulated implementation of the system.

Public Opinion , Social Media , Humans , Natural Language Processing , Unsupervised Machine Learning , Attitude , Algorithms

2.

Exploring the intersection of obesity and gender in COVID-19 outcomes in hospitalized Mexican patients: a comparative analysis of risk profiles using unsupervised machine learning.

Nezhadmoghadam, Fahimeh; Tamez-Peña, José Gerardo; Martinez-Ledesma, Emmanuel.

Front Public Health ; 12: 1337432, 2024.

Article En | MEDLINE | ID: mdl-38699419

Introduction: Obesity and gender play a critical role in shaping the outcomes of COVID-19 disease. These two factors have a dynamic relationship with each other, as well as other risk factors, which hinders interpretation of how they influence severity and disease progression. This work aimed to study differences in COVID-19 disease outcomes through analysis of risk profiles stratified by gender and obesity status. Methods: This study employed an unsupervised clustering analysis, using Mexico's national COVID-19 hospitalization dataset, which contains demographic information and health outcomes of patients hospitalized due to COVID-19. Patients were segmented into four groups by obesity and gender, with participants' attributes and clinical outcome data described for each. Then, Consensus and PAM clustering methods were used to identify distinct risk profiles based on underlying patient characteristics. Risk profile discovery was completed on 70% of records, with the remaining 30% available for validation. Results: Data from 88,536 hospitalized patients were analyzed. Obesity, regardless of gender, was linked with higher odds of hypertension, diabetes, cardiovascular diseases, pneumonia, and Intensive Care Unit (ICU) admissions. Men tended to have higher frequencies of ICU admissions and pneumonia and higher mortality rates than women. Within each of the four analysis groups (divided based on gender and obesity status), clustering analyses identified four to five distinct risk profiles. For example, among women with obesity, there were four profiles; those with a hypertensive profile were more likely to have pneumonia, and those with a diabetic profile were most likely to be admitted to the ICU. Conclusion: Our analysis emphasizes the complex interplay between obesity, gender, and health outcomes in COVID-19 hospitalizations. The identified risk profiles highlight the need for personalized treatment strategies for COVID-19 patients and can assist in planning for patterns of deterioration in future waves of SARS-CoV-2 virus transmission. This research underscores the importance of tackling obesity as a major public health concern, given its interplay with many other health conditions, including infectious diseases such as COVID-19.

COVID-19 , Hospitalization , Obesity , Unsupervised Machine Learning , Humans , COVID-19/epidemiology , COVID-19/mortality , Male , Female , Obesity/epidemiology , Mexico/epidemiology , Middle Aged , Hospitalization/statistics & numerical data , Risk Factors , Adult , Sex Factors , Aged , SARS-CoV-2 , Cluster Analysis

3.

Early autism diagnosis based on path signature and Siamese unsupervised feature compressor.

Yin, Zhuowen; Ding, Xinyao; Zhang, Xin; Wu, Zhengwang; Wang, Li; Xu, Xiangmin; Li, Gang.

Cereb Cortex ; 34(13): 72-83, 2024 May 02.

Article En | MEDLINE | ID: mdl-38696605

Autism spectrum disorder has been emerging as a growing public health threat. Early diagnosis of autism spectrum disorder is crucial for timely, effective intervention and treatment. However, conventional diagnosis methods based on communications and behavioral patterns are unreliable for children younger than 2 years of age. Given evidences of neurodevelopmental abnormalities in autism spectrum disorder infants, we resort to a novel deep learning-based method to extract key features from the inherently scarce, class-imbalanced, and heterogeneous structural MR images for early autism diagnosis. Specifically, we propose a Siamese verification framework to extend the scarce data, and an unsupervised compressor to alleviate data imbalance by extracting key features. We also proposed weight constraints to cope with sample heterogeneity by giving different samples different voting weights during validation, and used Path Signature to unravel meaningful developmental features from the two-time point data longitudinally. We further extracted machine learning focused brain regions for autism diagnosis. Extensive experiments have shown that our method performed well under practical scenarios, transcending existing machine learning methods and providing anatomical insights for autism early diagnosis.

Autism Spectrum Disorder , Brain , Deep Learning , Early Diagnosis , Humans , Autism Spectrum Disorder/diagnostic imaging , Autism Spectrum Disorder/diagnosis , Infant , Brain/diagnostic imaging , Brain/pathology , Magnetic Resonance Imaging/methods , Child, Preschool , Male , Female , Autistic Disorder/diagnosis , Autistic Disorder/diagnostic imaging , Autistic Disorder/pathology , Unsupervised Machine Learning

4.

Unsupervised machine learning for clustering forward head posture, protraction and retraction movement patterns based on craniocervical angle data in individuals with nonspecific neck pain.

Hwang, Ui-Jae; Kwon, Oh-Yun; Kim, Jun-Hee.

BMC Musculoskelet Disord ; 25(1): 376, 2024 May 13.

Article En | MEDLINE | ID: mdl-38741076

OBJECTIVES: The traditional understanding of craniocervical alignment emphasizes specific anatomical landmarks. However, recent research has challenged the reliance on forward head posture as the primary diagnostic criterion for neck pain. An advanced relationship exists between neck pain and craniocervical alignment, which requires a deeper exploration of diverse postures and movement patterns using advanced techniques, such as clustering analysis. We aimed to explore the complex relationship between craniocervical alignment, and neck pain and to categorize alignment patterns in individuals with nonspecific neck pain using the K-means algorithm. METHODS: This study included 229 office workers with nonspecific neck pain who applied unsupervised machine learning techniques. The craniocervical angles (CCA) during rest, protraction, and retraction were measured using two-dimensional video analysis, and neck pain severity was assessed using the Northwick Park Neck Pain Questionnaire (NPQ). CCA during sitting upright in a comfortable position was assessed to evaluate the resting CCA. The average of midpoints between repeated protraction and retraction measures was considered as the midpoint CCA. The K-means algorithm helped categorize participants into alignment clusters based on age, sex and CCA data. RESULTS: We found no significant correlation between NPQ scores and CCA data, challenging the traditional understanding of neck pain and alignment. We observed a significant difference in age (F = 140.14, p < 0.001), NPQ total score (F = 115.83, p < 0.001), resting CCA (F = 79.22, p < 0.001), CCA during protraction (F = 33.98, p < 0.001), CCA during retraction (F = 40.40, p < 0.001), and midpoint CCA (F = 66.92, p < 0.001) among the three clusters and healthy controls. Cluster 1 was characterized by the lowest resting and midpoint CCA, and CCA during pro- and -retraction, indicating a significant forward head posture and a pattern of retraction restriction. Cluster 2, the oldest group, showed CCA measurements similar to healthy controls, yet reported the highest NPQ scores. Cluster 3 exhibited the highest CCA during protraction and retraction, suggesting a limitation in protraction movement. DISCUSSION: Analyzing 229 office workers, three distinct alignment patterns were identified, each with unique postural characteristics; therefore, treatments addressing posture should be individualized and not generalized across the population.

Neck Pain , Posture , Unsupervised Machine Learning , Humans , Neck Pain/physiopathology , Male , Female , Adult , Posture/physiology , Middle Aged , Cluster Analysis , Head , Cervical Vertebrae/physiopathology , Cervical Vertebrae/diagnostic imaging , Movement/physiology , Pain Measurement/methods , Young Adult , Head Movements/physiology

5.

DFUSNN: zero-shot dual-domain fusion unsupervised neural network for parallel MRI reconstruction.

Chen, Shengyi; Duan, Jizhong; Ren, Xinmin; Wang, Junfeng; Liu, Yu.

Phys Med Biol ; 69(10)2024 May 10.

Article En | MEDLINE | ID: mdl-38604186

Objective. Recently, deep learning models have been used to reconstruct parallel magnetic resonance (MR) images from undersampled k-space data. However, most existing approaches depend on large databases of fully sampled MR data for training, which can be challenging or sometimes infeasible to acquire in certain scenarios. The goal is to develop an effective alternative for improved reconstruction quality that does not rely on external training datasets.Approach. We introduce a novel zero-shot dual-domain fusion unsupervised neural network (DFUSNN) for parallel MR imaging reconstruction without any external training datasets. We employ the Noise2Noise (N2N) network for the reconstruction in the k-space domain, integrate phase and coil sensitivity smoothness priors into the k-space N2N network, and use an early stopping criterion to prevent overfitting. Additionally, we propose a dual-domain fusion method based on Bayesian optimization to enhance reconstruction quality efficiently.Results. Simulation experiments conducted on three datasets with different undersampling patterns showed that the DFUSNN outperforms all other competing unsupervised methods and the one-shot Hankel-k-space generative model (HKGM). The DFUSNN also achieves comparable results to the supervised Deep-SLR method.Significance. The novel DFUSNN model offers a viable solution for reconstructing high-quality MR images without the need for external training datasets, thereby overcoming a major hurdle in scenarios where acquiring fully sampled MR data is difficult.

Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Neural Networks, Computer , Magnetic Resonance Imaging/methods , Image Processing, Computer-Assisted/methods , Unsupervised Machine Learning , Humans

6.

Using unsupervised learning to classify inlet water for more stable design of water reuse in industrial parks.

Chen, Kan; Shi, Xiaofei; Zhang, Zhihao; Chen, Shijun; Ma, Ji; Zheng, Tong; Alfonso, Leonardo.

Water Sci Technol ; 89(7): 1757-1770, 2024 Apr.

Article En | MEDLINE | ID: mdl-38619901

The water reuse facilities of industrial parks face the challenge of managing a growing variety of wastewater sources as their inlet water. Typically, this clustering outcome is designed by engineers with extensive expertise. This paper presents an innovative application of unsupervised learning methods to classify inlet water in Chinese water reuse stations, aiming to reduce reliance on engineer experience. The concept of 'water quality distance' was incorporated into three unsupervised learning clustering algorithms (K-means, DBSCAN, and AGNES), which were validated through six case studies. Of the six cases, three were employed to illustrate the feasibility of the unsupervised learning clustering algorithm. The results indicated that the clustering algorithm exhibited greater stability and excellence compared to both artificial clustering and ChatGPT-based clustering. The remaining three cases were utilized to showcase the reliability of the three clustering algorithms. The findings revealed that the AGNES algorithm demonstrated superior potential application ability. The average purity in six cases of K-means, DBSCAN, and AGNES were 0.947, 0.852, and 0.955, respectively.

Bays , Unsupervised Machine Learning , Reproducibility of Results , Algorithms , Cluster Analysis

7.

Symptom-based drug prediction of lifestyle-related chronic diseases using unsupervised machine learning techniques.

Bhattacharjee, Sudipto; Saha, Banani; Saha, Sudipto.

Comput Biol Med ; 174: 108413, 2024 May.

Article En | MEDLINE | ID: mdl-38608323

BACKGROUND AND OBJECTIVES: Lifestyle-related diseases (LSDs) impose a substantial economic burden on patients and health care services. LSDs are chronic in nature and can directly affect the heart and lungs. Therapeutic interventions only based on symptoms can be crucial for prompt treatment initiation in LSDs, as symptoms are the first information available to clinicians. So, this work aims to apply unsupervised machine learning (ML) techniques for developing models to predict drugs from symptoms for LSDs, with a specific focus on pulmonary and heart diseases. METHODS: The drug-disease and disease-symptom associations of 143 LSDs, 1271 drugs, and 305 symptoms were used to compute direct associations between drugs and symptoms. ML models with four different algorithms - K-Means, Bisecting K-Means, Mean Shift, and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) - were developed to cluster the drugs using symptoms as features. The optimal model was saved in a server for the development of a web application. A web application was developed to perform the prediction based on the optimal model. RESULTS: The Bisecting K-means model showed the best performance with a silhouette coefficient of 0.647 and generated 138 drug clusters. The drugs within the optimal clusters showed good similarity based on i) gene ontology annotations of the gene targets, ii) chemical ontology annotations, and iii) maximum common substructure of the drugs. In the web application, the model also provides a confidence score for each predicted drug while predicting from a new set of input symptoms. CONCLUSION: In summary, direct associations between drugs and symptoms were computed, and those were used to develop a symptom-based drug prediction tool for LSDs with unsupervised ML models. The ML-based prediction can provide a second opinion to clinicians to aid their decision-making for early treatment of LSD patients. The web application (URL - http://bicresources.jcbose.ac.in/ssaha4/sdldpred) can provide a simple interface for all end-users to perform the ML-based prediction.

Unsupervised Machine Learning , Humans , Chronic Disease , Life Style , Algorithms

8.

[Feeling analysis on allergen immunotherapy on Twitter using an unsupervised machine learning model]. / Análisis de sentimientos acerca de la inmunoterapia con alérgenos en Twitter mediante un modelo de aprendizaje automático no supervisado.

Tarango-García, Alejandro; Lugo-Reyes, Saul Oswaldo; Alvarez-Cardona, Aristoteles.

Rev Alerg Mex ; 71(1): 8-11, 2024 Feb 01.

Article Es | MEDLINE | ID: mdl-38683063

OBJECTIVE: Analyze feelings about allergen-specific immunotherapy on Twitter using the VADER model VADER (Valence Aware Dictionary and sEntiment Reasoner) model. METHODS: tweets related to specific allergen immunotherapy were obtained through the Twitter Application Programming Interface (API). The keywords "allergy shot" were used between January 1, 2012, and December 31, 2022. The data was processed by removing URLs, usernames, hashtags, multiple spaces, and duplicate tweets. Subsequently, a sentiment analysis was performed using the VADER model. RESULTS: A total of 34,711 tweets were retrieved, of which 1928 were eliminated. Of the remaining 32,783 tweets, 32.41% expressed a negative sentiment, 31.11% expressed a neutral sentiment, and 36.47% expressed a positive sentiment, with an average polarity of 0.02751 (neutral) over the 11-year period. CONCLUSIONS: The average polarity of tweets about allergen-specific immunotherapy is neutral over the 11 years analyzed. There was an annual increase in the average polarity over the years, with 2017, 2018, and 2022 having positive polarity averages. Additionally, the number of tweets decreased over time.

OBJETIVO: Analizar los sentimientos acerca de la inmunoterapia alérgeno-específica en Twitter mediante el modelo VADER (Valence Aware Dictionary and sEntiment Reasoner). MÉTODOS: Se utilizaron tweets relacionados con la inmunoterapia alérgeno-específica obtenidos a través del API (Application Programming Interface) de Twitter. Se incorporaron las palabras clave "allergy shot" en el período comprendido entre el 1 de enero de 2012 y el 31 de diciembre de 2022. Los datos obtenidos fueron procesados, eliminando las URL, nombres de usuarios, hashtags, espacios múltiples y tweets duplicados. Posteriormente, se realizó un análisis de sentimientos utilizando el modelo VADER. RESULTADOS: Se recolectaron 34,711 tweets, de los que se eliminaron 1928. De los 32,783 tweets restantes, se encontró que el 32.41% de los usuarios expresó un sentimiento negativo, el 31.11% un sentimiento neutral y el 36.47% un sentimiento positivo, con una media de polaridad de 0.02751 (neutral) a lo largo de los 11 años. CONCLUSIONES: La polaridad media de los tweets acerca de la inmunoterapia alérgeno-específica es neutral a lo largo de los 11 años analizados. Existe un aumento anual en la polaridad media positiva a lo largo de los años, sobre todo entre 2017, 2018 y 2022. La cantidad de tweets disminuyó con el tiempo.

Desensitization, Immunologic , Social Media , Unsupervised Machine Learning , Humans , Desensitization, Immunologic/methods , Emotions

9.

EpiDiP/NanoDiP: a versatile unsupervised machine learning edge computing platform for epigenomic tumour diagnostics.

Hench, Jürgen; Hultschig, Claus; Brugger, Jon; Mariani, Luigi; Guzman, Raphael; Soleman, Jehuda; Leu, Severina; Benton, Miles; Stec, Irenäus Maria; Hench, Ivana Bratic; Hoffmann, Per; Harter, Patrick; Weber, Katharina J; Albers, Anne; Thomas, Christian; Hasselblatt, Martin; Schüller, Ulrich; Restelli, Lisa; Capper, David; Hewer, Ekkehard; Diebold, Joachim; Kolenc, Danijela; Schneider, Ulf C; Rushing, Elisabeth; Della Monica, Rosa; Chiariotti, Lorenzo; Sill, Martin; Schrimpf, Daniel; von Deimling, Andreas; Sahm, Felix; Kölsche, Christian; Tolnay, Markus; Frank, Stephan.

Acta Neuropathol Commun ; 12(1): 51, 2024 Apr 04.

Article En | MEDLINE | ID: mdl-38576030

DNA methylation analysis based on supervised machine learning algorithms with static reference data, allowing diagnostic tumour typing with unprecedented precision, has quickly become a new standard of care. Whereas genome-wide diagnostic methylation profiling is mostly performed on microarrays, an increasing number of institutions additionally employ nanopore sequencing as a faster alternative. In addition, methylation-specific parallel sequencing can generate methylation and genomic copy number data. Given these diverse approaches to methylation profiling, to date, there is no single tool that allows (1) classification and interpretation of microarray, nanopore and parallel sequencing data, (2) direct control of nanopore sequencers, and (3) the integration of microarray-based methylation reference data. Furthermore, no software capable of entirely running in routine diagnostic laboratory environments lacking high-performance computing and network infrastructure exists. To overcome these shortcomings, we present EpiDiP/NanoDiP as an open-source DNA methylation and copy number profiling suite, which has been benchmarked against an established supervised machine learning approach using in-house routine diagnostics data obtained between 2019 and 2021. Running locally on portable, cost- and energy-saving system-on-chip as well as gpGPU-augmented edge computing devices, NanoDiP works in offline mode, ensuring data privacy. It does not require the rigid training data annotation of supervised approaches. Furthermore, NanoDiP is the core of our public, free-of-charge EpiDiP web service which enables comparative methylation data analysis against an extensive reference data collection. We envision this versatile platform as a useful resource not only for neuropathologists and surgical pathologists but also for the tumour epigenetics research community. In daily diagnostic routine, analysis of native, unfixed biopsies by NanoDiP delivers molecular tumour classification in an intraoperative time frame.

Epigenomics , Neoplasms , Humans , Unsupervised Machine Learning , Cloud Computing , Neoplasms/diagnosis , Neoplasms/genetics , DNA Methylation

10.

Energy landscapes of homopolymeric RNAs revealed by deep unsupervised learning.

Ramachandran, Vysakh; Potoyan, Davit A.

Biophys J ; 123(9): 1152-1163, 2024 May 07.

Article En | MEDLINE | ID: mdl-38571310

Conformational dynamics of RNA plays important roles in a variety of cellular functions such as transcriptional regulation, catalysis, scaffolding, and sensing. Recently, RNAs with low-complexity sequences have been shown to phase separate and form condensate phases similar to lowcomplexity protein domains. The affinity for phase separation and the material characteristics of RNA condensates are strongly dependent on sequence composition and patterning. We hypothesize that differences in the affinities for RNA phase separation can be uncovered by studying sequence-dependent conformational dynamics of single RNA chains. To this end, we have employed atomistic simulations and deep dimensionality reduction techniques to map temperature-dependent conformational free energy landscapes for 20 base-long homopolymeric RNA sequences: poly(U), poly(G), poly(C), and poly(A). The energy landscapes of homopolymeric RNAs reveal a plethora of metastable states with qualitatively different populations stemming from differences in base chemistry. Through detailed analysis of base, phosphate, and sugar interactions, we show that experimentally observed temperature-driven shifts in metastable state populations align with experiments on RNA phase transitions. Specifically, we find that the thermodynamics of unfolding of homopolymeric RNA follows the poly(G) > poly(A) > poly(C) > poly(U) order of stability, mirroring the propensity of RNA to form condensates. To conclude, this work shows that at least for homopolymeric RNA sequences the single-chain conformational dynamics contains sufficient information for predicting and quantifying condensate forming affinities of RNAs. Thus, we anticipate that atomically detailed studies of temeprature -dependent energy landscapes of RNAs will be a useful guide for understanding the propensity of various RNA molecules to form condensates.

Nucleic Acid Conformation , RNA , Thermodynamics , RNA/chemistry , RNA/metabolism , Molecular Dynamics Simulation , Unsupervised Machine Learning , Deep Learning , Temperature

11.

Self-paced regularized adaptive multi-view unsupervised feature selection.

Yang, Xuanhao; Che, Hangjun; Leung, Man-Fai; Wen, Shiping.

Neural Netw ; 175: 106295, 2024 Jul.

Article En | MEDLINE | ID: mdl-38614023

Multi-view unsupervised feature selection (MUFS) is an efficient approach for dimensional reduction of heterogeneous data. However, existing MUFS approaches mostly assign the samples the same weight, thus the diversity of samples is not utilized efficiently. Additionally, due to the presence of various regularizations, the resulting MUFS problems are often non-convex, making it difficult to find the optimal solutions. To address this issue, a novel MUFS method named Self-paced Regularized Adaptive Multi-view Unsupervised Feature Selection (SPAMUFS) is proposed. Specifically, the proposed approach firstly trains the MUFS model with simple samples, and gradually learns complex samples by using self-paced regularizer. l2,p-norm (0

Algorithms , Unsupervised Machine Learning , Humans , Neural Networks, Computer

12.

Unsupervised Sentence Representation Learning with Frequency-induced Adversarial tuning and Incomplete sentence filtering.

Wang, Bing; Li, Ximing; Yang, Zhiyao; Guan, Yuanyuan; Li, Jiayin; Wang, Shengsheng.

Neural Netw ; 175: 106315, 2024 Jul.

Article En | MEDLINE | ID: mdl-38626618

Pre-trained Language Model (PLM) is nowadays the mainstay of Unsupervised Sentence Representation Learning (USRL). However, PLMs are sensitive to the frequency information of words from their pre-training corpora, resulting in anisotropic embedding space, where the embeddings of high-frequency words are clustered but those of low-frequency words disperse sparsely. This anisotropic phenomenon results in two problems of similarity bias and information bias, lowering the quality of sentence embeddings. To solve the problems, we fine-tune PLMs by leveraging the frequency information of words and propose a novel USRL framework, namely Sentence Representation Learning with Frequency-induced Adversarial tuning and Incomplete sentence filtering (Slt-fai). We calculate the word frequencies over the pre-training corpora of PLMs and assign words thresholding frequency labels. With them, (1) we incorporate a similarity discriminator used to distinguish the embeddings of high-frequency and low-frequency words, and adversarially tune the PLM with it, enabling to achieve uniformly frequency-invariant embedding space; and (2) we propose a novel incomplete sentence detection task, where we incorporate an information discriminator to distinguish the embeddings of original sentences and incomplete sentences by randomly masking several low-frequency words, enabling to emphasize the more informative low-frequency words. Our Slt-fai is a flexible and plug-and-play framework, and it can be integrated with existing USRL techniques. We evaluate Slt-fai with various backbones on benchmark datasets. Empirical results indicate that Slt-fai can be superior to the existing USRL baselines.

Language , Unsupervised Machine Learning , Humans , Neural Networks, Computer , Natural Language Processing , Algorithms

13.

A novel hybrid supervised and unsupervised hierarchical ensemble for COVID-19 cases and mortality prediction.

Yakovyna, Vitaliy; Shakhovska, Nataliya; Szpakowska, Aleksandra.

Sci Rep ; 14(1): 9782, 2024 04 29.

Article En | MEDLINE | ID: mdl-38684770

Though COVID-19 is no longer a pandemic but rather an endemic, the epidemiological situation related to the SARS-CoV-2 virus is developing at an alarming rate, impacting every corner of the world. The rapid escalation of the coronavirus has led to the scientific community engagement, continually seeking solutions to ensure the comfort and safety of society. Understanding the joint impact of medical and non-medical interventions on COVID-19 spread is essential for making public health decisions that control the pandemic. This paper introduces two novel hybrid machine-learning ensembles that combine supervised and unsupervised learning for COVID-19 data classification and regression. The study utilizes publicly available COVID-19 outbreak and potential predictive features in the USA dataset, which provides information related to the outbreak of COVID-19 disease in the US, including data from each of 3142 US counties from the beginning of the epidemic (January 2020) until June 2021. The developed hybrid hierarchical classifiers outperform single classification algorithms. The best-achieved performance metrics for the classification task were Accuracy = 0.912, ROC-AUC = 0.916, and F1-score = 0.916. The proposed hybrid hierarchical ensemble combining both supervised and unsupervised learning allows us to increase the accuracy of the regression task by 11% in terms of MSE, 29% in terms of the area under the ROC, and 43% in terms of the MPP metric. Thus, using the proposed approach, it is possible to predict the number of COVID-19 cases and deaths based on demographic, geographic, climatic, traffic, public health, social-distancing-policy adherence, and political characteristics with sufficiently high accuracy. The study reveals that virus pressure is the most important feature in COVID-19 spread for classification and regression analysis. Five other significant features were identified to have the most influence on COVID-19 spread. The combined ensembling approach introduced in this study can help policymakers design prevention and control measures to avoid or minimize public health threats in the future.

COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , COVID-19/mortality , COVID-19/prevention & control , Humans , SARS-CoV-2/isolation & purification , Supervised Machine Learning , Pandemics , Algorithms , Unsupervised Machine Learning , United States/epidemiology , Machine Learning

14.

An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors.

Gallegos, Miguel; Isamura, Bienfait Kabuyaya; Popelier, Paul L A; Martín Pendás, Ángel.

J Chem Inf Model ; 64(8): 3059-3079, 2024 Apr 22.

Article En | MEDLINE | ID: mdl-38498942

Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.

Molecular Dynamics Simulation , Unsupervised Machine Learning , Normal Distribution , Automation

15.

Unsupervised learning of perceptual feature combinations.

Tamosiunaite, Minija; Tetzlaff, Christian; Wörgötter, Florentin.

PLoS Comput Biol ; 20(3): e1011926, 2024 Mar.

Article En | MEDLINE | ID: mdl-38442095

In many situations it is behaviorally relevant for an animal to respond to co-occurrences of perceptual, possibly polymodal features, while these features alone may have no importance. Thus, it is crucial for animals to learn such feature combinations in spite of the fact that they may occur with variable intensity and occurrence frequency. Here, we present a novel unsupervised learning mechanism that is largely independent of these contingencies and allows neurons in a network to achieve specificity for different feature combinations. This is achieved by a novel correlation-based (Hebbian) learning rule, which allows for linear weight growth and which is combined with a mechanism for gradually reducing the learning rate as soon as the neuron's response becomes feature combination specific. In a set of control experiments, we show that other existing advanced learning rules cannot satisfactorily form ordered multi-feature representations. In addition, we show that networks, which use this type of learning always stabilize and converge to subsets of neurons with different feature-combination specificity. Neurons with this property may, thus, serve as an initial stage for the processing of ecologically relevant real world situations for an animal.

Models, Neurological , Unsupervised Machine Learning , Animals , Neurons/physiology

16.

Clustering honey samples with unsupervised machine learning methods using FTIR data.

Avcu, Fatih Mehmet.

An Acad Bras Cienc ; 96(1): e20230409, 2024.

Article En | MEDLINE | ID: mdl-38451625

This study utilizes Fourier transform infrared (FTIR) data from honey samples to cluster and categorize them based on their spectral characteristics. The aim is to group similar samples together, revealing patterns and aiding in classification. The process begins by determining the number of clusters using the elbow method, resulting in five distinct clusters. Principal Component Analysis (PCA) is then applied to reduce the dataset's dimensionality by capturing its significant variances. Hierarchical Cluster Analysis (HCA) further refines the sample clusters. 20% of the data, representing identified clusters, is randomly selected for testing, while the remainder serves as training data for a deep learning algorithm employing a multilayer perceptron (MLP). Following training, the test data are evaluated, revealing an impressive 96.15% accuracy. Accuracy measures the machine learning model's ability to predict class labels for new data accurately. This approach offers reliable honey sample clustering without necessitating extensive preprocessing. Moreover, its swiftness and cost-effectiveness enhance its practicality. Ultimately, by leveraging FTIR spectral data, this method successfully identifies similarities among honey samples, enabling efficient categorization and demonstrating promise in the field of spectral analysis in food science.

Honey , Unsupervised Machine Learning , Fourier Analysis , Spectroscopy, Fourier Transform Infrared , Cluster Analysis

17.

Phenotypic Analysis of Hematopoietic Stem and Progenitor Cell Populations in Acute Myeloid Leukemia Based on Spectral Flow Cytometry, a 20-Color Panel, and Unsupervised Learning Algorithms.

Matthes, Thomas.

Int J Mol Sci ; 25(5)2024 Feb 29.

Article En | MEDLINE | ID: mdl-38474094

The analysis of hematopoietic stem and progenitor cell populations (HSPCs) is fundamental in the understanding of normal hematopoiesis as well as in the management of malignant diseases, such as leukemias, and in their diagnosis and follow-up, particularly the measurement of treatment efficiency with the detection of measurable residual disease (MRD). In this study, I designed a 20-color flow cytometry panel tailored for the comprehensive analysis of HSPCs using a spectral cytometer. My investigation encompassed the examination of forty-six samples derived from both normal human bone marrows (BMs) and patients with acute myeloid leukemia (AML) and myelodysplastic syndromes (MDS) along with those subjected to chemotherapy and BM transplantation. By comparing my findings to those obtained through conventional flow cytometric analyses utilizing multiple tubes, I demonstrate that my innovative 20-color approach enables a more in-depth exploration of HSPC subpopulations and the detection of MRD with at least comparable sensitivity. Furthermore, leveraging advanced analytical tools such as t-SNE and FlowSOM learning algorithms, I conduct extensive cross-sample comparisons with two-dimensional gating approaches. My results underscore the efficacy of these two methods as powerful unsupervised alternatives for manual HSPC subpopulation analysis. I expect that in the future, complex multi-dimensional flow cytometric data analyses, such as those employed in this study, will be increasingly used in hematologic diagnostics.

Hematopoietic Stem Cell Transplantation , Leukemia, Myeloid, Acute , Humans , Flow Cytometry/methods , Unsupervised Machine Learning , Leukemia, Myeloid, Acute/drug therapy , Hematopoietic Stem Cells/pathology , Hematopoietic Stem Cell Transplantation/methods , Neoplasm, Residual/diagnosis

18.

Public perception on active aging after COVID-19: an unsupervised machine learning analysis of 44,343 posts.

Chen, Peipei; Jin, Yuwei; Ma, Xinfang; Lin, Yan.

Front Public Health ; 12: 1329704, 2024.

Article En | MEDLINE | ID: mdl-38515596

Introduction: To analyze public perceptions of active aging in China on mainstream social media platforms to determine whether the "14th Five Year Plan for the Development of the Aging Career and Older Adult Care System" issued by the CPC in 2022 has fully addressed public needs. Methods: The original tweets posted on Weibo between January 1, 2020, and June 30, 2022, containing the words "aging" or "old age" were extracted. A bidirectional encoder representation from transformers (BERT)-based model was used to generate themes related to this perception. A qualitative thematic analysis and an independent review of the theme labels were conducted by the researchers. Results: The findings indicate that public perceptions revolved around four themes: (1) health prevention and protection, (2) convenient living environments, (3) cognitive health and social integration, and (4) protecting the rights and interests of the older adult. Discussion: Our study found that although the Plan aligns with most of these themes, it lacks clear planning for financial security and marital life.

COVID-19 , Social Media , Humans , Aged , COVID-19/psychology , SARS-CoV-2 , Unsupervised Machine Learning , Public Opinion

19.

How Socio-economic Inequalities Cluster People with Diabetes in Malaysia: Geographic Evaluation of Area Disparities Using a Non-parameterized Unsupervised Learning Method.

Ganasegeran, Kurubaran; Abdul Manaf, Mohd Rizal; Safian, Nazarudin; Waller, Lance A; Mustapha, Feisul Idzwan; Abdul Maulud, Khairul Nizam; Mohd Rizal, Muhammad Faid.

J Epidemiol Glob Health ; 14(1): 169-183, 2024 Mar.

Article En | MEDLINE | ID: mdl-38315406

Accurate assessments of epidemiological associations between health outcomes and routinely observed proximal and distal determinants of health are fundamental for the execution of effective public health interventions and policies. Methods to couple big public health data with modern statistical techniques offer greater granularity for describing and understanding data quality, disease distributions, and potential predictive connections between population-level indicators with areal-based health outcomes. This study applied clustering techniques to explore patterns of diabetes burden correlated with local socio-economic inequalities in Malaysia, with a goal of better understanding the factors influencing the collation of these clusters. Through multi-modal secondary data sources, district-wise diabetes crude rates from 271,553 individuals with diabetes sampled from 914 primary care clinics throughout Malaysia were computed. Unsupervised machine learning methods using hierarchical clustering to a set of 144 administrative districts was applied. Differences in characteristics of the areas were evaluated using multivariate non-parametric test statistics. Five statistically significant clusters were identified, each reflecting different levels of diabetes burden at the local level, each with contrasting patterns observed under the influence of population-level characteristics. The hierarchical clustering analysis that grouped local diabetes areas with varying socio-economic, demographic, and geographic characteristics offer opportunities to local public health to implement targeted interventions in an attempt to control the local diabetes burden.

Diabetes Mellitus , Socioeconomic Factors , Unsupervised Machine Learning , Humans , Malaysia/epidemiology , Male , Female , Cluster Analysis , Diabetes Mellitus/epidemiology , Middle Aged , Adult , Aged , Health Status Disparities

20.

Supervised and unsupervised machine learning approaches for monitoring subvisible particles within an aluminum-salt adjuvanted vaccine formulation.

Greenblott, David N; Wood, Caitlin V; Zhang, Jingtao; Viza, Nelia; Chintala, Ramesh; Calderon, Christopher P; Randolph, Theodore W.

Biotechnol Bioeng ; 121(5): 1626-1641, 2024 May.

Article En | MEDLINE | ID: mdl-38372650

Suspensions of protein antigens adsorbed to aluminum-salt adjuvants are used in many vaccines and require mixing during vial filling operations to prevent sedimentation. However, the mixing of vaccine formulations may generate undesirable particles that are difficult to detect against the background of suspended adjuvant particles. We simulated the mixing of a suspension containing a protein antigen adsorbed to an aluminum-salt adjuvant using a recirculating peristaltic pump and used flow imaging microscopy to record images of particles within the pumped suspensions. Supervised convolutional neural networks (CNNs) were used to analyze the images and create "fingerprints" of particle morphology distributions, allowing detection of new particles generated during pumping. These results were compared to those obtained from an unsupervised machine learning algorithm relying on variational autoencoders (VAEs) that were also used to detect new particles generated during pumping. Analyses of images conducted by applying both supervised CNNs and VAEs found that rates of generation of new particles were higher in aluminum-salt adjuvant suspensions containing protein antigen than placebo suspensions containing only adjuvant. Finally, front-face fluorescence measurements of the vaccine suspensions indicated changes in solvent exposure of tryptophan residues in the protein that occurred concomitantly with new particle generation during pumping.

Aluminum , Vaccines , Unsupervised Machine Learning , Adjuvants, Immunologic/chemistry , Vaccines/chemistry , Antigens/chemistry